I have a Scala data processing tool which is failing with a java.lang.OutOfMemoryError exception. The tool needs to make a couple passes over a large data file (the one I'm working on is over 700MB), so it would be convenient if the entire thing could be stored in memory.
I run the tool from the command line or from a Bash script using the "scala" runner. How do I increase the JVM heap size for this? I've tried passing -Xmx1024m, but it does not recognize this argument. I'm using a nightly build of Scala 2.8.0 (r18678).
You can pass options to the JVM via -J. E.g.,
scala -J-Xmx2g ...
The JAVA_OPTS environment variable is used to specify the options passed to the java command. See the Scala manpage for details.
Based on Joe's suggestion, I looked at the scala program, which is a Bash script that calls java with various Scala specific arguments. It allows extra command line arguments to be passed to Java through the JAVA_OPTS environment variable. So you can increase the heap size like this:
JAVA_OPTS="-Xmx2g" scala classname arguments...
Edit JAVA_OPTS.
As a hack you can edit the scala/bin/scala file to edit the parameters for the java command. Not pretty.
Related
I want to write a command line interface for my application. What's the best tools for this job? I wanna a REPL interface like sbt console
Simplest way IMHO:
use https://github.com/scopt/scopt for parsing command line arguments and then
run http://ammonite.io/#Embedding to simply start Scala REPL on steroids with some classes already imported and objects already instantiated.
I am curious if there is any possibility to tell the sbt console paste command to execute in a specific package.
So far I read about using :paste -raw but that did not do the trick.
My problem is due to the nice use of sparks default parameters writable which are only internally available in sparks private API I choose to place my custom estimator into sparks namespace. This works fine when run locally but the great and interactive experience of the scala repl is sort of destroyed as I cannot run my estimator which resides in organizing.apache.spark.ml
One option is to take your estimator, use sbt assembly to turn it into a jar and then to upload that jar when you run a spark shell or run with spark-submit. Is there a reason why you need to use the spark namespace? Why not use your own name space?
I have a self defined text editable (datafile) file format (which uses certain python types as dict, tuples, lists etc as well) for providing arguments data to my python scripts. These arguments are later used in my Main python script.
Currently, at the start of Main program, I am consolidating (using os.walk) all such datafiles and parsing them every time which takes a lot of time.
This is my issue!
Is there a mechanism in eclipse to run a python script (independent like a parser) and use above "datafile" as argument to check for syntax errors immediately after I save the file. So that I will not bother to check for syntax errors while running Main program.
Is this possible?
I am using Eclipse IDE with pydev for my development work.
Regards,
As per the Scala tutorials, we need to compile the Scala code using scalac filename.scala before executing it using scala filename. But when I tried like scala filename.scala, it runs and got the output.
Why is it so? So compiling using scalac is not required to run the code? Could someone please explain this.
Thanks.
scala -help was really helpful. It says
A file argument will be run as a scala script unless it contains only
self-contained compilation units (classes and objects) and exactly one
runnable main method. In that case the file will be compiled and the
main method invoked. This provides a bridge between scripts and standard
scala source.
Thanks Dennis for the pointer.
running the scala command actually compiles it first behind the scenes, then runs the resulting program.
This behavior is explained in the help for the scala command.
You can get the help by executing the scala -help command.
Used as a scripting language, does Scala have some sort of include directive, or is there a way to launch a script from an other script ?
The scala command has the :load filename command to load a Scala file interactively. Alternatively the scala command's -i filename argument can be used to preload the file.
As of beginning of 2013, there seems to be no built-in support for multiple-file scripts.
There's one guy that implemented #include support for non-interactive scala scripts, by assembling and compiling the files in a prior stage (have not tried it yet).
Here is the blog post about it:
http://www.crosson.org/2012/01/simplifying-scala-scripts-adding.html
And the git repository:
https://github.com/dacr/bootstrap
I hope this, or something along the lines, get official some day, since the -i filename scala switch seems to apply only for the interactive console.
Until then a proper scripting language, like Ruby, might still remain the best option.
Ammonite is an option. It implements scala (plus extensions) including import in scripts.