Why spark blas use f2jBLAS instead of native BLAS for level 1 routines? - scala

I found the following code in BLAS.scala:
// For level-1 routines, we use Java implementation.
private def f2jBLAS: NetlibBLAS = {
if (_f2jBLAS == null) {
_f2jBLAS = new F2jBLAS
}
_f2jBLAS
}
I think the native blas is faster than a pure Java implementation.
So why spark choose the f2jblas for level 1 routines, Is there any reason I do not know?
Thank you!

The answer is most likely to be found in the Performance section of the readme file of the netlib-java repository.
Java has a reputation with older generation developers because Java applications were slow in the 1990s. Nowadays, the JIT ensures that Java applications keep pace with – or exceed the performance of – C / C++ / Fortran applications.
This is followed by charts showing detailed benchmark results for various BLAS routines in both pure Java (translated from Fortran with f2j) and from native BLAS on both Linux on ARM and macOS on x86_64. The ddot benchmark shows that on x86 (JRE for ARM doesn't seem to have JIT capabilities) F2J performs on par with the reference native BLAS implementation for longer vector sizes and even outperforms it for shorter vector sizes. The caveat here is that the JIT kicks in after a couple of invocations, which is not a problem as most ML algorithms are iterative in nature. Most of the level 1 routines are fairly simple and the JIT compiler is able to generate well optimised code. This is also why the tuning efforts in highly optimised BLAS implementations go into the level 2 and 3 routines.

Related

Calling RenderScript from C / JNI

I'm looking to replace the C atan2 function with something more efficient. RenderScript does offer atan2, including versions that take vectors.
The examples I found, demonstrates calling RenderScript from Java. Is it possible to call RS from C code ? an example would be great.
Thanks
It used to be possible, though RS support in the NDK has been dropped for some time now. It may still be possible, but even the NDK samples no longer include RS samples. Starting with Android 7 you could try to use "Single Source RenderScript", described here, which is supposed to be possible from C/C++ code.
The efficiency gains you may see using RS are due to a few possible reasons (which are very platform dependent):
RS will parallelize operations over your data set. In some cases the function you are calling (such as atan2) may parallelize the operation, if possible.
Your RS code may be executed on a co-processor (such as a GPU or DSP).
The RS provided intrinsics and library functions are highly optimized for the platform. Using atan2 as an example again, it may be possible that the function is more optimized in the RS core than the standard C library as it could be using a co-processor or it could be using architecture specific optimized implementation (assembly).
All of that being said, your code can take an I/O hit when moving data between RS space (Allocation) back to the non-RS code.
I have found two examples; here is the one I got to build and run:
https://github.com/adhere/NDKCallRenderScriptDemo
I've been searching for documentation of the C++ API but haven't found it.

Tooling for expressive, feature rich numeric computations on the JVM

I am looking for numeric computation tooling on the JVM. My major requirements are expressiveness/readability, ease of use, evaluation and features in terms of mathematical functions. I guess I am after something like the Matlab kernel (probably including some basic libraries and w/o graphics) on the JVM. I'd like to be able to "throw" computional code at a running JVM and want this code to be evaluated. I don't want to worry about types. Arbitrary precision and performance is not so important.
I guess there are some nice libraries out there but I think an appropriate language on top is needed to get the expressiveness.
Which tooling would you guys suggest to address expressive, feature rich numeric computation on the JVM ?
From the jGroovyLab page:
The GroovyLab environment aims to provide a Matlab/Scilab like scientific computing platform that is supported by a scripting engine implemented in Groovy language. The GroovyLab user can work either with a Matlab-lke command console, or with a flexible editor based on the jsyntaxpane (http://code.google.com/p/jsyntaxpane/) component, that offers more convenient code development. Also, GroovyLab supports Computer Algebra based on the symja (http://code.google.com/p/symja/) project.
And there is also GroovyLab:
GroovyLab is a collection of Groovy classes to provide matlab-like syntax and basic features (linear algebra, 2D/3D plots). It is based on jmathplot and jmatharray libs:
Groovy has a smooth learning curve for Java programmers and a flexible syntax similar to Ruby. It is also pretty easy to write a DSL on it.
Though Groovy's performance is pretty good for a dynamic language, you can use static compilation if you are in the need for it.
Most of Mathworks Matlab is built on the Intel Math Kernel Library (MKL), which is (IMHO) the unbeatable champion in linear algebra computations. There is java support, but it costs 500 dollar (the MKL, not just the java support)...
Best second option if you want to use java is jblas, which uses BLAS and LAPACK, the industry standards for linear algebra.
Pure java libraries' performances are horrible apparently, see here...
Spire sounds like it's aiming at the area you're looking at. It takes advantage of a lot of recent scala features such as macros to get decent performance without having to sacrifice the expressiveness of being in a high level language.
There's also breeze, which is targeted at machine learning but includes a fair amount of linear algebra stuff.
Depending how much work you want to get into and what languages you're already familiar with, Incanter in the Clojure world might be worth a look. Also quickly evolving in Clojure right now is core.matrix, which aims to encapsulate high-level common abstractions in linear algebra implemented with various methods or packages.
You highlighted expressiveness in your post, and the nice thing about Clojure is that, as a Lisp, it is possible to make or extend DSLs to closely match problem domains. This is one of the big draws of the language (and of Lisps in general).
I'm the original author of core.matrix for Clojure. So I have a clear affiniy and much more knowledge in this specific space. That said, I'm still going to try and give you an honest answer :-)
I was the the same position as you a year or so back, looking for a solution for numeric computation that would be scalable, flexible and suitable for deployment as a clustered cloud service.
I ended up going with Clojure for the following reasons:
Functional Programming: Clojure is a functional programming language at heart, more so than most other language (although not as much as Haskell....). Lazy infinite sequences, persistent data structures, immutability throughout etc. Makes for elegany code when you are dealing with big computations.
Metaprogramming: I saw a need to do code generation for vector / computational experessions. Hence being a Lisp was a big plus: once you have done code generation in a homoiconic language with a "whole language" macro system then it's hard to find anything else that comes close.
Concurrency - Clojure has an impressive and movel approach to multi-code concurrency. If you haven't seen it then watch: http://www.infoq.com/presentations/Value-Identity-State-Rich-Hickey
Interactive REPL: Something I've always felt is very important for data work. You want to be able to work with your code / data "live" to get a real feel for its properties. Having a dynamically typed language with an interactive REPL works wonders here.
JVM based: big advantage for pragmantic purposes, because of the huge library / tool ecosystem and the excellent engineering in the JVM as a runtime platform.
Community: I saw a lot of innovation going on in Clojure, particularly around the general area of data and analytics.
The main thing Clojure was lacking at that time was a good library / API for matrix operations. There were some nice tools in Incanter, but they weren't very general purpose or performant. Hence I started developing core.matrix, which is shaping up to be an idiomatic Clojure-flavoured equivalent of NumPY / SciPY. Right now it is still work in progress but good enough for production use if you are careful.
In terms of low-level matrix support, I also maintain vectorz-clj, which is my attempt to provide a core.mattrix implementation that offers high performance vector/matrix operations while remaining Pure Java (i.e. no native dependencies). If you are interested in the performance of this, you may like to see:
http://clojurefun.wordpress.com/2013/03/07/achieving-awesome-numerical-performance-in-clojure/
My second choice after Clojure would have been Scala. I liked Scala's slightly greater maturity and decent static type system. Both the languages are JVM based so the library / tool side was a tie. It was probably the Lisp features that clinched it.
If you happen to have access to Mathematica, then it's fairly easy to get it working with the JVM by means of J/Link. For Clojure, Clojuratica is an excellent library to make that as seemless as possible, although it's not been maintained for a while and it may take some effort to get it working in modern environments again.

What is the fastest scheme implementation?

Obviously, that will depend on what you want to do: numerical analysis, threading, databases, etc. I've seen the benchmarks; Larceny and Bigloo seem to come up ahead. Is there any implementation of Scheme that performs pretty well in several different benchmarks? Are there any that can create code that runs faster than produced by SBCL? I don't see why SBCL should be so fast - Scheme is a far simpler language than Common Lisp!
http://community.schemewiki.org/?Stalin
http://en.wikipedia.org/wiki/Stalin_(Scheme_implementation)
From Wikipedia:
Stalin (STAtic Language ImplementatioN) is an aggressive optimizing
batch whole-program Scheme compiler written by Jeffrey Mark Siskind.
It uses advanced flow analysis and type inference and a variety of
other optimization techniques to produce code. Stalin is intended for
production use in generating an optimized executable.
The compiler itself runs slowly, and there is little or no support for
debugging or other niceties. Full R4RS Scheme is supported, with a few
minor and rarely encountered omissions. Interfacing to external C
libraries is straightforward. The compiler itself does lifetime
analysis and hence does not generate as much garbage as might be
expected, but global reclamation of storage is done using the Boehm
garbage collector.
It seems that Stalin is no longer being developed.
Among the Schemes that are fully standards compliant (at least with R5RS) and ready for prime-time use, Chez Scheme must be the fastest.
Based on these benchmarks, it looks like Chez Scheme, Gambit, and Racket are roughly tied for the title of Fastest Scheme.

Are there any managed programming languages that compile to machine code?

Managed languages being the ones that handle memory cleanup for you.
EDIT I'm not talking about garbage collection. I was just interested in knowing about languages that would free() memory for me automatically, and still compile down to machine code.
You seem to be confusing "Managed" and "Garbage collection", while often managed languages (for example C# and Java) have automated garbage collection, "managed" actually refers to the fact that there is a "virtual machine" which executes your code (see http://en.wikipedia.org/wiki/Managed_code).
So for example the CLR (common language runtime) is the virtual machine executing .Net code, and the JVM (Java virtual machine) is the virtual machine executing java code.
You can in fact have Garbage collection for unmanaged languages (for example C++), and visa versa have managed languages without garbage collection (EDIT: I was looking for some but I can't seem to find any unless Objective C counts, I'm not sure it makes a huge amount of sense to create a managed language without garbage collection anyway)
Both of Java and C# can in fact be compiled directly into machine code, so they are executed directly and not using a virtual machine - for .Net code this is done using NGEN (in fact the CLR compiles .Net assemblies into machine code as you execute it, so-called "Just in time" compilation)
EDIT: As an update to the update of your question, there are in fact a number of alternatives to garbage collection in a spectrum between the extreme of complete manual memory management and garbage collection, and a lot of languages which compile to machine code incorporate varying forms of memory management which dont require you to explicitly free memory.
Can I ask - is this an "out of interest" question, or are you trying to select a language for a project - If the latter then why are you so interested in having your langauge compile down to machine code? Certainly in the case of .Net having your code JIT compiled offers a number of performance advantages (in the majority of cases), also NGENing your code doesn't remove the dependency on the .Net framework.
lots:
LISP (and variants), Erlang, C# (under Mono), Haskell, Java (with gcj)
Sure there are. Java, for instance. (gcj)
However the term managed itself implies you have to carry some runtime around.
A few more, in the broader sense of "managed" meaning safe (via runtime type checking or exhaustive static analysis) and/or garbage collected:
OCaml
D
Ada
Prolog
Clean
Eiffel
Analog to Efraims's answer, any .NET program will compile to machine code as well, usually in 2 steps (JIT) but there is a NGEN tool to pre-compile the MSIL to native.
There is a semi-GC choice : GLIB.
Gilb use reference count to manage lifespan of object. When refrence count meet 0, an object is cleaned.
It much much more inconvienient than .NET or Java or Python, but when you have to use C, it's better than nothing.

Code generation for Java JVM / .NET CLR

I am doing a compilers discipline at college and we must generate code for our invented language to any platform we want to. I think the simplest case is generating code for the Java JVM or .NET CLR. Any suggestion which one to choose, and which APIs out there can help me on this task? I already have all the semantic analysis done, just need to generate code for a given program.
Thank you
From what I know, on higher level, two VMs are actually quite similar: both are classic stack-based machines, with largely high-level operations (e.g. virtual method dispatch is an opcode). That said, CLR lets you get down to the metal if you want, as it has raw data pointers with arithmetic, raw function pointers, unions etc. It also has proper tailcalls. So, if the implementation of language needs any of the above (e.g. Scheme spec mandates tailcalls), or if it is significantly advantaged by having those features, then you would probably want to go the CLR way.
The other advantage there is that you get a stock API to emit bytecode there - System.Reflection.Emit - even though it is somewhat limited for full-fledged compiler scenarios, it is still generally enough for a simple compiler.
With JVM, two main advantages you get are better portability, and the fact that bytecode itself is arguably simpler (because of less features).
Another option that i came across what a library called run sharp that can generate the MSIL code in runtime using emit. But in a nicer more user friendly way that is more like c#. The latest version of the library can be found here.
http://code.google.com/p/runsharp/
In .NET you can use the Reflection.Emit Namespace to generate MSIL code.
See the msdn link: http://msdn.microsoft.com/en-us/library/3y322t50.aspx