When, exactly, does Scala code perform heap allocations? - scala

I've written a lot of high-performance Java code, and past experience has taught me that one of the most important things for getting peak performance is to be fully aware of the load you're putting on the garbage collector, and work to reduce that load in critical loops. For example, cryptographic hash functions should be written in such a way that they never allocate any memory.
Prior to Java 1.5 it was very easy to know exactly when Java code would be compiled into JVM bytecodes that performed heap allocation: only the "new" keyword in Java source code would result in a heap allocation. Post-Java-1.5 you can just look at the "desugaring" of the new language features (like int->Integer autoconversion).
I'm having a really, really, really hard time finding an equivalent answer for Scala, and this is a problem. Did I miss it? Where can I find a clear, comprehensive explanation of exactly when the Scala compiler will produce heap-allocating bytecodes? And no, the scala compiler source code isn't an answer since it changes over time -- I'm asking about the language rather than one specific compiler.
Edit: forgot to mention that the Java primitive "+" applied to Strings is a language-level construct that incurs heap allocation. So pre-Java-1.5 it should be "new and +-on-String".

I highly recommend getting to know the program JD-GUI, which decompiles .class files into java-ish code and lets you inspect it. I've learned a lot about how Scala code is compiled from looking at that.
http://jd.benow.ca/

Related

What #unspecialized annotation is for?

Reading a source code of Function2 I've noticed that #unspecialized was added recently (in scala 2.10). What is the reasoning behind it and how does it affect compilation? Why do we need it for Function*'s tupled, compose and some other methods?
I'd say that a safe guess is that it disables specialization for the target method. A good reason to disable specialization is to avoid bytecode bloat. Specializing every method indiscriminately is a bad idea, because each specialization is basically a distinct copy of the same method, and the bytecode size grows pretty fast. So I guess here that specializing Function2 was deemed generally worth the increased bytecode size, at the exception of tupled and compose which were not important enough to warrant an additonal increase. It's a delicate balance between code size and execution speed, the idea is to get the most bangs for the bucks.
Aside: as a funny illustration of how problematic the code bloat induced by specilization can be, see this recipe for a scala bomb :
Scala bomb? (like a zip bomb)

intellij idea 11, scala slow execution [duplicate]

I've been programming in Scala for a while and I like it but one thing I'm annoyed by is the time it takes to compile programs. It's seems like a small thing but with Java I could make small changes to my program, click the run button in netbeans, and BOOM, it's running, and over time compiling in scala seems to consume a lot of time. I hear that with many large projects a scripting language becomes very important because of the time compiling takes, a need that I didn't see arising when I was using Java.
But I'm coming from Java which as I understand it, is faster than any other compiled language, and is fast because of the reasons I switched to Scala(It's a very simple language).
So I wanted to ask, can I make Scala compile faster and will scalac ever be as fast as javac.
There are two aspects to the (lack of) speed for the Scala compiler.
Greater startup overhead
Scalac itself consists of a LOT of classes which have to be loaded and jit-compiled
Scalac has to search the classpath for all root packages and files. Depending on the size of your classpath this can take one to three extra seconds.
Overall, expect a startup overhead of scalac of 4-8 seconds, longer if you run it the first time so disk-caches are not filled.
Scala's answer to startup overhead is to either use fsc or to do continuous building with sbt. IntelliJ needs to be configured to use either option, otherwise its overhead even for small files is unreasonably large.
Slower compilation speed. Scalac manages about 500 up to 1000 lines/sec. Javac manages about 10 times that. There are several reasons for this.
Type inference is costly, in particular if it involves implicit search.
Scalac has to do type checking twice; once according to Scala's rules and a second time after erasure according to Java's rules.
Besides type checking there are about 15 transformation steps to go from Scala to Java, which all take time.
Scala typically generates many more classes per given file size than Java, in particular if functional idioms are heavily used. Bytecode generation and class writing takes time.
On the other hand, a 1000 line Scala program might correspond to a 2-3K line Java program, so some of the slower speed when counted in lines per second has to balanced against more functionality per line.
We are working on speed improvements (for instance by generating class files in parallel), but one cannot expect miracles on this front. Scalac will never be as fast as javac.
I believe the solution will lie in compile servers like fsc in conjunction with good dependency analysis so that only the minimal set of files has to be recompiled. We are working on that, too.
The Scala compiler is more sophisticated than Java's, providing type inference, implicit conversion, and a much more powerful type system. These features don't come for free, so I wouldn't expect scalac to ever be as fast as javac. This reflects a trade-off between the programmer doing the work and the compiler doing the work.
That said, compile times have already improved noticeably going from Scala 2.7 to Scala 2.8, and I expect the improvements to continue now that the dust has settled on 2.8. This page documents some of the ongoing efforts and ideas to improve the performance of the Scala compiler.
Martin Odersky provides much more detail in his answer.
You should be aware that Scala compilation takes at least an order of magnitude longer than Java to compile. The reasons for this are as follows:
Naming conventions (a file XY.scala file need not contain a class called XY and may contain multiple top-level classes). The compiler may therefore have to search more source files to find a given class/trait/object identifier.
Implicits - heavy use of implicits means the compiler needs to search any in-scope implicit conversion for a given method and rank them to find the "right" one. (i.e. the compiler has a massively-increased search domain when locating a method.)
The type system - the scala type system is way more complicated than Java's and hence takes more CPU time.
Type inference - type inference is computationally expensive and a job that javac does not need to do at all
scalac includes an 8-bit simulator of a fully armed and operational battle station, viewable using the magic key combination CTRL-ALT-F12 during the GenICode compilation phase.
The best way to do Scala is with IDEA and SBT. Set up an elementary SBT project (which it'll do for you, if you like) and run it in automatic compile mode (command ~compile) and when you save your project, SBT will recompile it.
You can also use the SBT plug-in for IDEA and attach an SBT action to each of your Run Configurations. The SBT plug-in also gives you an interactive SBT console within IDEA.
Either way (SBT running externally or SBT plug-in), SBT stays running and thus all the classes used in building your project get "warmed up" and JIT-ed and the start-up overhead is eliminated. Additionally, SBT compiles only source files that need it. It is by far the most efficient way to build Scala programs.
The latest revisions of Scala-IDE (Eclipse) are much better atmanaging incremental compilation.
See "What’s the best Scala build system?" for more.
The other solution is to integrate fsc - Fast offline compiler for the Scala 2 language - (as illustrated in this blog post) as a builder in your IDE.
But not in directly Eclipse though, as Daniel Spiewak mentions in the comments:
You shouldn't be using FSC within Eclipse directly, if only because Eclipse is already using FSC under the surface.
FSC is basically a thin layer on top of the resident compiler which is precisely the mechanism used by Eclipse to compile Scala projects.
Finally, as Jackson Davis reminds me in the comments:
sbt (Simple build Tool) also include some kind of "incremental" compilation (through triggered execution), even though it is not perfect, and enhanced incremental compilation is in the work for the upcoming 0.9 sbt version.
Use fsc - it is a fast scala compiler that sits as a background task and does not need loading all the time. It can reuse previous compiler instance.
I'm not sure if Netbeans scala plugin supports fsc (documentation says so), but I couldn't make it work. Try nightly builds of the plugin.
You can use the JRebel plugin which is free for Scala. So you can kind of "develop in the debugger" and JRebel would always reload the changed class on the spot.
I read some statement somewhere by Martin Odersky himself where he is saying that the searches for implicits (the compiler must make sure there is not more than one single implicit for the same conversion to rule out ambiguities) can keep the compiler busy. So it might be a good idea to handle implicits with care.
If it doesn't have to be 100% Scala, but also something similar, you might give Kotlin a try.
-- Oliver
I'm sure this will be down-voted, but extremely rapid turn-around is not always conducive to quality or productivity.
Take time to think more carefully and execute fewer development micro-cycles. Good Scala code is denser and more essential (i.e., free from incidental details and complexity). It demands more thought and that takes time (at least at first). You can progress well with fewer code / test / debug cycles that are individually a little longer and still improve your productivity and the quality of your work.
In short: Seek an optimum working pattern better suited to Scala.

Compiler cache for Scala?

Compilation in Scala is fairly slow. Are there any hopes to make it faster?
One thing which comes to my mind is Scala equivalent of ccache: a cache where compiler does not have to recompile some parts. I know that type inference make things more complicated, but I wonder whether it is feasible at all. Perhaps caching should be done on different level (e.g. AST) or it needs to do some kind of preprocessing.
I will be happy to see some estimates how much could be potentially saved if that kind of tool exists. What kind of challenges are needed to be solved to build it?
As well as SBT which only recompiles what's needed, JRebel helps to solve this problem and has Scala support.

AOT compilation or native code compilation of Scala?

My scala application needs to perform simple operations over large arrays of integers & doubles, and performance is a bottleneck. I've struggled to put my finger on exactly when certain optimizations kick in (e.g. escape analysis) although I can observe their results through various benchmarking. I'd love to do some AOT compilation of my scala application, so I can see or enforce (or implement) certain optimizations ... or compile to native code, if possible, so I can cut corners like bounds checking and observe if it makes a difference.
My question: what alternative compilation methods work for scala? I'm interested in tools like llvm, vmkit, soot, gcj, etc. Who is using those successfully with scala at this point, or are none of these methods currently compatible or maintained?
GCJ can compile JVM classes to native code. This blog describes tests done with Scala code: http://lampblogs.epfl.ch/b2evolution/blogs/index.php/2006/10/02/scala_goes_native_almost?blog=7
To answer my own question, there is no alternative backend for Scala except for the JVM. The .NET backend has been in development for a long time, but its status is unclear. The LLVM backend is also not yet ready for use, and it's not clear what its future is.

Speed Comparison: C++ vs Objective C [duplicate]

When programming a CPU intensive or GPU intensive application on the iPhone or other portable hardware, you have to make wise algorithmic decisions to make your code fast.
But even great algorithm choices can be slow if the language you're using performs more poorly than another.
Is there any hard data comparing Objective-C to C++, specifically on the iPhone but maybe just on the Mac desktop, for performance of various similar language aspects? I am very familiar with this article comparing C and Objective-C, but this is a larger question of comparing two object oriented languages to each other.
For example, is a C++ vtable lookup really faster than an Obj-C message? How much faster? Threading, polymorphism, sorting, etc. Before I go on a quest to build a project with duplicate object models and various test code, I want to know if anybody has already done this and what the results where. This type of testing and comparison is a project in and of itself and can take a considerable amount of time. Maybe this isn't one project, but two and only the outputs can be compared.
I'm looking for hard data, not evangelism. Like many of you I love and hate both languages for various reasons. Furthermore, if there is someone out there actively pursuing this same thing I'd be interesting in pitching in some code to see the end results, and I'm sure others would help out too. My guess is that they both have strengths and weaknesses, my goal is to find out precisely what they are so that they can be avoided/exploited in real-world scenarios.
Mike Ash has some hard numbers for performance of various Objective-C method calls versus C and C++ in his post "Performance Comparisons of Common Operations". Also, this post
by Savoy Software is an interesting read when it comes to tuning the performance of an iPhone application by using Objective-C++.
I tend to prefer the clean, descriptive syntax of Objective-C over Objective-C++, and have not found the language itself to be the source of my performance bottlenecks. I even tend to do things that I know sacrifice a little bit of performance if they make my code much more maintainable.
Yes, well written C++ is considerably faster. If you're writing performance critical programs and your C++ is not as fast as C (or within a few percent), something's wrong. If your ObjC implementation is as fast as C, then something's usually wrong -- i.e. the program is likely a bad example of ObjC OOD because it probably uses some 'dirty' tricks to step below the abstraction layer it is operating within, such as direct ivar accesses.
The Mike Ash 'comparison' is very misleading -- I would never recommend the approach to compare execution times of programs you have written, or recommend it to compare C vs C++ vs ObjC. The results presented are provided from a test with compiler optimizations disabled. A program compiled with optimizations disabled is rarely relevant when you are measuring execution times. To view it as a benchmark which compares C++ against Objective-C is flawed. The test also compares individual features, rather than entire, real world optimized implementations -- individual features are combined in very different ways with both languages. This is far from a realistic performance benchmark for optimized implementations. Examples: With optimizations enabled, IMP cache is as slow as virtual function calls. Static dispatch (as opposed to dynamic dispatch, e.g. using virtual) and calls to known C++ types (where dynamic dispatch may be bypassed) may be optimized aggressively. This process is called devirtualization, and when it is used, a member function which is declared virtual may even be inlined. In the case of the Mike Ash test where many calls are made to member functions which have been declared virtual and have empty bodies: these calls are optimized away entirely when the type is known because the compiler sees the implementation and is able to determine dynamic dispatch is unnecessary. The compiler can also eliminate calls to malloc in optimized builds (favoring stack storage). So, enabling compiler optimizations in any of C, C++, or Objective-C can produce dramatic differences in execution times.
That's not to say the presented results are entirely useless. You could get some useful information about external APIs if you want to determine if there are measurable differences between the times they spend in pthread_create or +[NSObject alloc] on one platform or architecture versus another. Of course, these two examples will be using optimized implementations in your test (unless you happen to be developing them). But for comparing one language to another in programs you compile… the presented results are useless with optimizations disabled.
Object Creation
Consider also object creation in ObjC - every object is allocated dynamically (e.g. on the heap). With C++, objects may be created on the stack (e.g. approximately as fast as creating a C struct and calling a simple function in many cases), on the heap, or as elements of abstract data types. Each time you allocate and free (e.g. via malloc/free), you may introduce a lock. When you create a C struct or C++ object on the stack, no lock is required (although interior members may use heap allocations) and it often costs just a few instructions or a few instructions plus a function call.
As well, ObjC objects are reference counted instances. The actual need for an object to be a std::shared_ptr in performance critical C++ is very rare. It's not necessary or desirable in C++ to make every instance a shared, reference counted instance. You have much more control over ownership and lifetime with C++.
Arrays and Collections
Arrays and many collections in C and C++ also use strongly typed containers and contiguous memory. Since the address of the next element's members are often known, the optimizer can do much more, and you have great cache and memory locality. With ObjC, that's far from reality for standard objects (e.g. NSObject).
Dispatch
Regarding methods, many C++ implementations use few virtual/dynamic calls, particularly in highly optimized programs. These are static method calls and fodder for the optimizers.
With ObjC methods, each method call (objc message send) is dynamic, and is consequently a firewall for the optimizer. Ultimately, that results in many restrictions or inconveniences regarding what you can and cannot do to keep performance at a minimum when writing performance critical ObjC. This may result in larger methods, IMP caching, frequent use of C.
Some realtime applications cannot use any ObjC messaging in their render paths. None -- audio rendering is a good example of this. ObjC dispatch is simply not designed for realtime purposes; Allocations and locks may happen behind the scenes when messaging objects, making the complexity/time of objc messaging unpredictable enough that the audio rendering may miss its deadline.
Other Features
C++ also provides generics/template implementations for many of its libraries. These optimize very well. They are typesafe, and a lot of inlining and optimizations may be made with templates (consider it polymorphism, optimization, and specialization which takes place at compilation). C++ adds several features which just are not available or comparable in strict ObjC. Trying to directly compare langs, objects, and libraries which are very different is not so useful -- it's a very small subset of actual realizations. It's better to expand the question to a library/framework or real program, considering many aspects of design and implementation.
Other Points
C and C++ symbols can be more easily removed and optimized away in various stages of the build (stripping, dead code elimination, inlining and early inlining, as well as Link Time Optimization). The benefits of this include reduced binary sizes, reduced launch/load times, reduced memory consumption, etc.. For a single app, that may not be such a big deal; but if you reuse a lot of code, and you should, then your shared libraries could add a lot of unnecessary weight to the program, if implemented ObjC -- unless you are prepared to jump through some flaming hoops. So scalability and reuse are also factors in medium/large projects, and groups where reuse is high.
Included Libraries
ObjC library implementors also optimize for the environment, so its library implementors can make use of some language and environment features to offer optimized implementations. Although there are some pretty significant restrictions when writing an optimized program in pure ObjC, some highly optimized implementations exist in Cocoa. This is one of Cocoa's strong points, although the C++ standard library (what some people call the STL) is no slouch either. Cocoa operates at a much higher level of abstraction than C++ -- if you don't know well what you're doing (or should be doing), operating closer to the metal can really cost you. Falling back on to a good library implementation if you are not an expert in some domain is a good thing, unless you are really prepared to learn. As well, Cocoa's environments are limited; you can find implementations/optimizations which make better use of the OS.
If you're writing optimized programs and have experience doing so in both C++ and ObjC, clean C++ implementations will often be twice as fast or faster than clean ObjC (yes, you can compare against Cocoa). If you know how to optimize, you can often do better than higher level, general purpose abstractions. Although, some optimized C++ implementations will be as fast as or slower than Cocoa's (e.g. my initial attempt at file I/O was slower than Cocoa's -- primarily because the C++ implementation initializes its memory).
A lot of it comes down to the language features you are familiar with. I use both langs, they both have different strengths and models/patterns. They complement each other quite well, and there are great libraries for both. If you're implementing a complex, performance critical program, correct use of C++'s features and libraries will give you much more control and provide significant advantages for optimization, such that in the right hands, "several times faster" is a good default expectation (don't expect to win every time, or without some work, however). Remember, it takes years to understand C++ well enough to really reach that point.
I keep the majority of my performance critical paths as C++, but also recognize that ObjC is also a very good solution for some problems, and that there are some very good libraries available.
It's very hard to collect "hard data" for this that's not misguiding.
The biggest problem with doing a feature-to-feature comparison like you suggest is that the two languages encourage very different coding styles. Objective-C is a dynamic language with duck typing, where typical C++ usage is static. The same object-oriented architecture problem would likely have very different ideal solutions using C++ or Objective-C.
My feeling (as I have programmed much in both languages, mostly on huge projects): To maximize Objective-C performance, it has to be written very close to C. Whereas with C++, it's possible to make much more use of the language without any performance penalty compared to C.
Which one is better? I don't know. For pure performance, C++ will always have the edge. But the OOP style of Objective-C definitely has its merits. I definitely think it is easier to keep a sane architecture with it.
This really isn't something that can be answered in general as it really depends on how you use the language features. Both languages will have things that they are fast at, things that they are slow at, and things that are sometimes fast and sometimes slow. It really depends on what you use and how you use it. The only way to be certain is to profile your code.
In Objective C you can also write c++ code, so it might be easier to code in Objective C for the most part, and if you find something that doesn't perform well in it, then you can have a go at writting a c++ version of it and seeing if that helps (C++ tends to optimize better at compile time). Objective C will be easier to use if APIs you are interfacing with are also written in it, plus you might find it's style of OOP is easier or more flexible.
In the end, you should go with what you know you can write safe, robust code in and if you find an area that needs special attention from the other language, then you can swap to that. X-Code does allow you to compile both in the same project.
I have a couple of tests I did on an iPhone 3G almost 2 years ago, there was no documentation or hard numbers around in those days. Not sure how valid they still are but the source code is posted and attached.
This isn't a very extensive test, I was mainly interested in NSArray vs C Array for iterating a large number of objects.
http://memo.tv/nsarray_vs_c_array_performance_comparison
http://memo.tv/nsarray_vs_c_array_performance_comparison_part_ii_makeobjectsperformselector
You can see the C Array is much faster at high iterations. Since then I've realized that the bottleneck is probably not the iteration of the NSArray but the sending of the message. I wanted to try methodForSelector and calling the methods directly to see how big the difference would be but never got round to it. According to Mike Ash's benchmarks it's just over 5x faster.
I don't have hard data for Objective C, but I do have a good place to look for C++.
C++ started as C with Classes according to Bjarne Stroustroup in his reflection on the early years of C++ (http://www2.research.att.com/~bs/hopl2.pdf), so C++ can be thought of (like Objective C) as pushing C to its limits for object orientation.
What are those limits? In the 1994-1997 time frame, a lot of researchers figured out that object-orientation came at a cost due to dynamic binding, e.g. when C++ functions are marked virtual and there may/may not be children classes that override these functions. (In Java and C#, all functions expect ctors are inherently virtual, and there isnt' much you can do about it.) In "A Study of Devirtualization Techniques for a Java Just-In-Time Compiler" from researchers at IBM Research Tokyo, they contrast the techniques used to deal with this, including one from Urz Hölzle and Gerald Aigner. Urz Hölzle, in a separate paper with Karel Driesen, had shown that on average 5.7% of time in C++ programs (and up to ~50%) was spent in calling virtual functions (e.g. vtables + thunks). He later worked with some Smalltalk researachers in what ended up the Java HotSpot VM to solve these problems in OO. Some of these features are being backported to C++ (e.g. 'protected' and Exception handling).
As I mentioned, C++ is static typed where Objective C is duck typed. The performance difference in execution (but not lines of code) probably is a result of this difference.
This study says to really get the performance in a CPU intensive game, you have to use C. The linked article is complete with a XCode project that you can run.
I believe the bottom line is: Use Objective-C where you must interact with the iPhone's functions (after all, putting trampolines everywhere can't be good for anyone), but when it comes to loops, things like vector object classes, or intensive array access, stick with C++ STL or C arrays to get good performance.
I mean it would be totally silly to see position = [[Vector3 alloc] init] ;. You're just asking for a performance hit if you use references counts on basic objects like a position vector.
yes. c++ reign supreme in performance/expresiveness/resource tradeoff.
"I'm looking for hard data, not evangelism". google is your best friend.
obj-c nsstring is swapped with c++'s by apple enginneers for performance. in a resource constrained devices, only c++ cuts it as a MAINSTREAM oop language.
NSString stringWithFormat is slow
obj-c oop abstraction is deconstructed into procedural-based c-structs for performance, otherwise a MAGNITUDE order slower than java! the author is also aware of message caching - yet no-go. so modeling lots of small players/enemies objects is done in oop with c++ or else, lots of Procedural structs with a simple OOP wrapper around it with obj-c. there can be one paradigm that equates Procedural + Object-Oriented Programming = obj-c.
http://ejourneyman.wordpress.com/2008/04/23/writing-a-ray-tracer-for-cocoa-objective-c/