From this comparison of serialization libraries on the JVM, it looks like it is faster to create an object in Scala than in Java. The difference is in nanoseconds, though.
Is there any real reason why it would take less time to create an object in Scala, or the graph just reflects improper benchmarking or some other sort of imprecision?
40 nanosecond difference in object creation time is background noise in a Intel Core i7 920.
Assuming the numbers are an average over several runs, 40 nanoseconds is just 0.04 microseconds. Assuming on Windows 7 64-bit that the High Performance clock was functioning correctly, you're probably looking at hiccups in windows, the phase of the moon, statistical error, measuring program error, memory allocation implementation speeds, or something else entirely.
Scala creates more small objects automatically. This makes object creation faster on average but the serialization size larger.
Update on this older post:
The page has moved,
https://github.com/eishay/jvm-serializers/wiki
Scala is not listed now. Searching the group did not explain why?
Old post, but arguments continued,
Scala is faster than Java until...
I think the Rex Kerr post is referring to the way Scala collection classes save their elements individually. I don't have benchmarks to hand, but this is not a little (hard to prove and unprofitably) faster, I've benchmarked as repeatedly faster. Presumably, the Scala coders know this.
See the code for readObject/writeObject methods in the Scala collection ImmutableList here,
Code page for Immutable List
loop, loop...
A quick look at the benchmark code,
benchmark code for Scala
shows use of JavaConversions._ which, in the ongoing flux of development, and bearing the above in mind, may have given/give Scala a slight advantage.
Related
I know it is totally a nonsense question but due to my illiteracy on programming skill this question came to my mind.
Cats and scalaz are used so that we can code in Scala similar to Haskell/in pure functional programming way. But for achieving this we need to add those libraries additionally with our projects. Eventually for using these we need to wrap our codes with their objects and functions. It is something adding extra codes and dependencies.
I don't know whether these create larger objects in memory.
These is making me think about. So my question: will I face any performance issue like more memory consumption if I use cats/scalaz ?
Or should I avoid these if my application needs performance?
Do cats and scalaz create performance overhead on application?
Absolutely.
The same way any line of code adds performance overhead.
So, if that is your concern, then don't write any code (well, actually the world may be simpler if we would have never tried all this).
Now, dick answer outside. The proper question you should be asking is: "Does the overhead of X library is harmful to my software?"; remember this applies to any library, actually to any code you write, to any algorithm you pick, etc.
And, in order to answer that question, we need some things before.
Define the SLAs the software you are writing must hold. Without those, any performance question / observation you made is pointless. It doesn't matter if something is faster / slower if you don't know if that is meaningful for you and your clients.
Once you have SLAs you need to perform stress tests to verify if your current version of the software satisfies those. Because, if your current code is performant enough, then you should worry about other things like maintainability, testing, adding more features, etc.
PS: Remember that those SLAs should not be raw numbers but be expressed in terms of percentiles, the same goes for the results of the tests.
When you found that you are falling your SLAs then you need to do proper benchmarking and debugging to identify the bottlenecks of your project. As you saw, caring about performance must be done on each line of code, but that is a lot of work that usually doesn't produce any relevant output. Thus, instead of evaluating the performance of everything, we find the bottlenecks first, those small pieces of code that have the biggest contributions to the overall performance of your software (remember the Pareto principle).
Remember that in this step, we have to be integral, network matters too. (and you will see this last one is usually the biggest slowdown; thus, usually you would rather search for architectural solutions like using Fibers instead of Threads rather than trying to optimize small functions. Also, sometimes the easier and cheaper solution is better infrastructure).
When you find the bottleneck, then you need to formulate some alternatives, implement those and not only benchmark them but do Statistical hypothesis testing to validate if the proposed changes are worth it or not. And, of course, validate if they were enough to satisfy the SLAs.
Thus, as you can see, performance is an art and a lot of work. So, unless you are committed to doing all this then stop worrying about something you will not measure and optimize properly.
Rather, focus on increasing the maintainability of your code. This actually also helps performance, because when you find that you need to change something you would be grateful that the code is as clean as possible and that the whole architecture of the code allows for an easy change.
And, believe me when I say that, using tools like cats, cats-effect, fs2, etc will help with that regard. Also, they actually pretty optimized on their core so you should be good for a lot of use cases.
Now, the big exception is that if you know that the work you are doing will be very CPU and memory bound then yeah, you pretty much can be sure all those abstractions will be harmful. In those cases, you may even want to stay away from the JVM and rather write pretty low-level code in a language like Rust which will provide you with proper tools for that kind of problem and still be way safer than plain old C.
I am going to leave out a lot of 'irrelevant' details here to help people concentrate on the actual question.
I have a Swift project which involves a great deal of calculations (numerical integration and multi parameter best fit etc). To speed things up I am aiming to use concurrent processing.
Using XCTest classes, I have discovered that with my closure calling a function defined within my module, if I use DispatchQueue.concurrentPerform with a single iteration, it takes time t. When repeated with 5 iterations, it runs about 5% slower (I am happy with that).
NB the function is a static function on a struct (my collection of calculus routines).
However, if I put the function in a separate module and import this, repeating the test with 1 iteration takes a similar time t. But now when I try it with 5 iterations, the call takes twice as long (105% slower in fact).
Swift version: 4.2.1
OS: macOS 10.14.3
Xcode 10.1
Processor: 6 core Core i7 (Mac mini 2018)
All 'objects' are structs and value types used everywhere, apart from the function reference.
Quick summary again: using DispatchQueue.concurrentPerform(), compared to the baseline time for 1 iteration of a same-module defined function, 5 iterations is 5% slower. However when doing the same process using a function which has been defined in an imported module, baseline time remains unchanged for 1 iteration but 5 iterations is 105% slower.
Can anyone explain why this happens, and hopefully suggest a way of avoiding this slowdown while keeping my collection in an importable module?
Feel free to ask questions if you need more information.
Problem has been solved. No clue what the cause was.
Removed my project, created a fresh one, imported the files back in to it, included it in the workspace.
Now 5 concurrent threads process in only 10% longer than a single thread (doing 5x the amount of work). Would still love to know what caused the issue, and would be good if the efficiency drop was only 5% as had been in the earlier mentioned case.
But for this improvement just from starting a fresh project, I'm not going to quibble!
I have written an application in Scala. Basically, the first step is to create a array of objects an then to initialise these objects from a csv file. When running the application on the jvm it is really slow, and after some experimenting I found out that using the -J-Xincgc flag which enables incremental garbage collection speeds up the application by a factor of 4 (it's 4 times faster with the switch!). I wonder:
Why?
Did I use some inefficient coding, and if so, where should I start to find out whats going on?
Thanks!
I'll assume you're running this on hotspot.
The hotspot JVM has a whole zoo of garbage collectors, most of which also may have some sort of sub-modes or various command-line switches that significantly alter their behavior.
Which GC is used by default varies based on JVM version, operating system and 32/64bit VM.
So you basically changed whatever the default was to a specific algorithm that happened to perform "faster" for your workload.
But "faster" is a fuzzy measure. Wall time is not the same as CPU cycles spent if you consider multi-threading. And some collectors may simply choose to grow the heap more aggressively, thus deferring the cost of collection to a later point in time, which you might not have measured if your program didn't run long enough.
To make an accurate assessment much more information would be needed
what GC was used by default
your VM version
how many cores your CPU has
what kind of workload do you have (multi/single-thread, long/short-running, expected memory footprint, object allocation rate)
Oracle's GC tuning guide may prove useful for you
In your case, -Xincgc translates to CMS in incremental mode, which is intended for single-core environments and has been deprecated as of java8. It probably just happened to be better than the default, but it's not necessarily an optimal choice.
If you get into a situation where you are running close to your heap-size limit, you can waste a lot of GC time, which can lead to a lot of false findings about performance. If that's your situation, first increase your heap-size limit before doing anything else. Consider use of jvisualvm to eyeball the situation - it's trivially easy to get started with.
Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 8 years ago.
Improve this question
I see posts like the for-comprehension in [1] and it really makes me wonder what the overall implication of using the immutable Map vs a Mutable one is. It seems like Scala developers are very comfortable with allowing mutations of immutable data structures to incur the cost of a new object- or maybe I'm just missing something. If every mutation operation on an immutable data structure is returning a new instance, though I understand it's good for thread safety, but what if i know how to fine-tune my mutable objects already to make these same guarantees?
[1] In Scala, how can I do the equivalent of an SQL SUM and GROUP BY?
In general, the only way to answer these kind of performance questions is to profile them in your real-world code. Microbenchmarks are often misleading (see e.g. this benchmarking tale) - and particularly if you're talking about concurrency the best strategy can be very different depending on how concurrent your use case is in practice.
In theory, a Sufficiently Smart Compilerâ„¢ should be able - perhaps with the help of a linear type system (inferred or otherwise) - to reproduce all the efficiency advantages of a mutable data structure. In fact, since it has more information available about the programmer's intent and is less constrained by incidental details that the programmer had to specify, such a compiler ought to be able to generate higher-performance code - and e.g. GCC rewrites code into immutable form (SSA) for optimization purposes. For an example that hits closer to home, many real-world Java programs have perfectly adequate throughput, but have issues with latency caused by Java's garbage collector stopping the world to compact the heap. A JVM that was aware that certain objects were immutable would be able to move them without stopping the world (you can simply copy the object, update all references to it, and then delete the old copy, since it doesn't matter if some threads see the old version while some of them see the new one).
In practice, it depends, and again the only way is to benchmark your specific case. In my experience, for the level of investment of programmer time that's available for most practical business problems, spending x hours on a (immutable) Scala version tends to yield a more performant program than spending the same time on a mutable Scala or Java version - indeed, in the amount of programmer time it takes to produce an acceptably-performing Scala version it would probably be impossible to complete a Java version at all (particularly if we require the same defect rate). On the other hand, if you have unlimited expert programmer time available and need to get the absolute best performance possible, you would probably want to use a very low-level mutable language (this is why LAPACK is still written in Fortran) - or even implement your algorithm directly on an FPGA as JP Morgan recently did.
But even in this case you probably want to have a prototype in a higher-level language so that you can write tests and compare the two to confirm that the high-performance implementation works correctly. Particularly if we're just talking about mutable vs. immutable in Scala, premature optimization is the root of all evil. Write your program, and then if performance is inadequate, profile it and look at the hotspots. If you really are spending too much time copying an immutable data structure, that's an appropriate time to replace it with a mutable version, and carefully check the thread safety guarantees by hand. If you're writing properly decoupled code then it should be easy to replace the performance-critical pieces as and when you need to, and until then you can reap the development time gains of code that's simpler and easier to reason about (particularly in concurrency cases). In my experience performance problems in well-written code are a lot less likely than people expect; most software performance issues are caused by a poor choice of algorithm or data structure rather than this kind of small overhead.
Your question starts with a wrong assumption, based on a misunderstanding of the cost incurring of using immutable objects.
Working with guaranteed immutable objects that are build form immutable objects allows you to use structural sharing, so you can create new objects based on the old ones without having to resort to a deep copy of the object and you can ,roughly spoken, reuse parts of the object the new on is based on.
So this mitigates the impact of using immutable objects greatly.
So what is the difference to fine-tuned, hand-crafted mutable objects ?
immutable objects fit better for the FP paradigma
compile time optimization and checks
lowers the chance of runtime exceptions
The question is very generic, so it is hard to give a definite answer. It seems that you are just uncomfortable with the amount of object allocation happening in idiomatic scala code using for comprehensions and the like.
The scala compiler does not do any special magic to fuse operations or to elide object allocations. It is up to the person writing the data structure to make sure that functional data structures reuse the as much as possible from previous versions (structural sharing). Many of the data structures used in scala collections do this reasonably well. See for example this talk about Functional Data Structures in Scala to give you a general idea.
If you are interested in the details, the book to get is Purely Functional Data Structures by Chris Okasaki. The material in this book applies also to other functional languages like Haskell and OCaml and Clojure.
The JVM is extremely good at allocating and collecting short-lived objects. So many things that seem outrageously inefficient to somebody accustomed to low level programming are actually surprisingly efficient. But there are definitely situations where mutable state has performance or other advantages. That is why scala does not forbid mutable state, but only has a preference towards immutability. If you find that you really need mutable state for performance reasons, it is usually a good idea to wrap your mutable state in an akka actor instead of trying to get low-level thread synchronization right.
Assume there are two class of applications:
(1) Intensive number crunching and numerical and mathematical computations
(2) Intensive string regex expression matching, xpath searching, and other string manipulations where strings are mostly stored in collection classes.
In Both cases assume clients access these applications thousands of times per second or even in parallel.
So if I have the choice to implement the applications in the server backends, I can choose either Java 7 or Scala. Which one should I choose to get faster performance and produce more reliable code?
Google did some benchmarks recently that you might find interesting - see paper linked to here: http://www.readwriteweb.com/hack/2011/06/cpp-go-java-scala-performance-benchmark.php
The paper is surprisingly un-scientific, but you will get a rough feel for what can be done. Of particular interest may be section V.F
Daniel Mahler improved the Scala version by creating a
more functional version, which is kept in the Scala Pro
directories. This version is only 270 lines of code, about 25%
of the C++ version, and not only is it shorter, run-time also
improved by about 3x. It should be noted that this version
performs algorithmic improvements as well, and is therefore
not directly comparable to the other Pro versions.
It's not clear to me whether this version with algorithmic improvements is included in their speed benchmark table (I don't think so), but it does indicate that you may be able to produce performance improvements by adopting algorithmic improvements that are more viable to implement in Scala. It won't do much for simple string processing, however.
A big factor will be how competent you are in programming these languages, and how good you are at optimizing them. Java is obviously more verbose but you're less likely to run into performance "gotchas".
Two points which might enable better performance for numerical computations than in Java:
The practical one: Scala makes it extremely easy to enable parallel computation of "embarrassingly parallel" problems. While the same could be done in Java it would require much more time and expertise, making it likely that it will only be done in rare circumstances.
The technical one: Scala can specialize generic data structures for primitive types, making boxing/unboxing unnecessary. The Java compiler is not able to do that.
Scala uses Java's String so the amount of possible improvements here is quite limited. But there are other data structures like ropes which provide better performance than String in some cases.
Depending on your expertise and effort, I would expect that you can get better results here or there. Normally, with an infinite amount of development time and money, you can improve, improve and improve your code in every language. (Think of bigger and bigger caches, specialised sorters, precomputed defaults and so on).
With a good understanding of both languages and some experience in performance questions of your field, I wouldn't expect much differences, but you could save some time by the more collection friendly scala approach, and the time, saved on normal development, could be spend in performance analysis and improvement.
There is in principle not really a reason why Scala would be faster than Java for number crunching applications.
I would not choose Java or Scala or any other JVM language if I wanted to write a serious high-performance number crunching application.
From my own experience (and ofcourse this is only anecdotal evidence and definitely not proof that this is true in all cases) the JVM is not the best suited platform for heavy number crunching. If raw number crunching speed is important you would probably be better off with something that's more close to the "metal", for example C++, which allows you to for example use Intel SSE instructions and do other low-level optimizations, or use the GPU with CUDA if your algorithm is suitable for that.