Do cats and scalaz create performance overhead on application? - scala

I know it is totally a nonsense question but due to my illiteracy on programming skill this question came to my mind.
Cats and scalaz are used so that we can code in Scala similar to Haskell/in pure functional programming way. But for achieving this we need to add those libraries additionally with our projects. Eventually for using these we need to wrap our codes with their objects and functions. It is something adding extra codes and dependencies.
I don't know whether these create larger objects in memory.
These is making me think about. So my question: will I face any performance issue like more memory consumption if I use cats/scalaz ?
Or should I avoid these if my application needs performance?

Do cats and scalaz create performance overhead on application?
Absolutely.
The same way any line of code adds performance overhead.
So, if that is your concern, then don't write any code (well, actually the world may be simpler if we would have never tried all this).
Now, dick answer outside. The proper question you should be asking is: "Does the overhead of X library is harmful to my software?"; remember this applies to any library, actually to any code you write, to any algorithm you pick, etc.
And, in order to answer that question, we need some things before.
Define the SLAs the software you are writing must hold. Without those, any performance question / observation you made is pointless. It doesn't matter if something is faster / slower if you don't know if that is meaningful for you and your clients.
Once you have SLAs you need to perform stress tests to verify if your current version of the software satisfies those. Because, if your current code is performant enough, then you should worry about other things like maintainability, testing, adding more features, etc.
PS: Remember that those SLAs should not be raw numbers but be expressed in terms of percentiles, the same goes for the results of the tests.
When you found that you are falling your SLAs then you need to do proper benchmarking and debugging to identify the bottlenecks of your project. As you saw, caring about performance must be done on each line of code, but that is a lot of work that usually doesn't produce any relevant output. Thus, instead of evaluating the performance of everything, we find the bottlenecks first, those small pieces of code that have the biggest contributions to the overall performance of your software (remember the Pareto principle).
Remember that in this step, we have to be integral, network matters too. (and you will see this last one is usually the biggest slowdown; thus, usually you would rather search for architectural solutions like using Fibers instead of Threads rather than trying to optimize small functions. Also, sometimes the easier and cheaper solution is better infrastructure).
When you find the bottleneck, then you need to formulate some alternatives, implement those and not only benchmark them but do Statistical hypothesis testing to validate if the proposed changes are worth it or not. And, of course, validate if they were enough to satisfy the SLAs.
Thus, as you can see, performance is an art and a lot of work. So, unless you are committed to doing all this then stop worrying about something you will not measure and optimize properly.
Rather, focus on increasing the maintainability of your code. This actually also helps performance, because when you find that you need to change something you would be grateful that the code is as clean as possible and that the whole architecture of the code allows for an easy change.
And, believe me when I say that, using tools like cats, cats-effect, fs2, etc will help with that regard. Also, they actually pretty optimized on their core so you should be good for a lot of use cases.
Now, the big exception is that if you know that the work you are doing will be very CPU and memory bound then yeah, you pretty much can be sure all those abstractions will be harmful. In those cases, you may even want to stay away from the JVM and rather write pretty low-level code in a language like Rust which will provide you with proper tools for that kind of problem and still be way safer than plain old C.

Related

Cost of Scala's immutable object creation [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 8 years ago.
Improve this question
I see posts like the for-comprehension in [1] and it really makes me wonder what the overall implication of using the immutable Map vs a Mutable one is. It seems like Scala developers are very comfortable with allowing mutations of immutable data structures to incur the cost of a new object- or maybe I'm just missing something. If every mutation operation on an immutable data structure is returning a new instance, though I understand it's good for thread safety, but what if i know how to fine-tune my mutable objects already to make these same guarantees?
[1] In Scala, how can I do the equivalent of an SQL SUM and GROUP BY?
In general, the only way to answer these kind of performance questions is to profile them in your real-world code. Microbenchmarks are often misleading (see e.g. this benchmarking tale) - and particularly if you're talking about concurrency the best strategy can be very different depending on how concurrent your use case is in practice.
In theory, a Sufficiently Smart Compiler™ should be able - perhaps with the help of a linear type system (inferred or otherwise) - to reproduce all the efficiency advantages of a mutable data structure. In fact, since it has more information available about the programmer's intent and is less constrained by incidental details that the programmer had to specify, such a compiler ought to be able to generate higher-performance code - and e.g. GCC rewrites code into immutable form (SSA) for optimization purposes. For an example that hits closer to home, many real-world Java programs have perfectly adequate throughput, but have issues with latency caused by Java's garbage collector stopping the world to compact the heap. A JVM that was aware that certain objects were immutable would be able to move them without stopping the world (you can simply copy the object, update all references to it, and then delete the old copy, since it doesn't matter if some threads see the old version while some of them see the new one).
In practice, it depends, and again the only way is to benchmark your specific case. In my experience, for the level of investment of programmer time that's available for most practical business problems, spending x hours on a (immutable) Scala version tends to yield a more performant program than spending the same time on a mutable Scala or Java version - indeed, in the amount of programmer time it takes to produce an acceptably-performing Scala version it would probably be impossible to complete a Java version at all (particularly if we require the same defect rate). On the other hand, if you have unlimited expert programmer time available and need to get the absolute best performance possible, you would probably want to use a very low-level mutable language (this is why LAPACK is still written in Fortran) - or even implement your algorithm directly on an FPGA as JP Morgan recently did.
But even in this case you probably want to have a prototype in a higher-level language so that you can write tests and compare the two to confirm that the high-performance implementation works correctly. Particularly if we're just talking about mutable vs. immutable in Scala, premature optimization is the root of all evil. Write your program, and then if performance is inadequate, profile it and look at the hotspots. If you really are spending too much time copying an immutable data structure, that's an appropriate time to replace it with a mutable version, and carefully check the thread safety guarantees by hand. If you're writing properly decoupled code then it should be easy to replace the performance-critical pieces as and when you need to, and until then you can reap the development time gains of code that's simpler and easier to reason about (particularly in concurrency cases). In my experience performance problems in well-written code are a lot less likely than people expect; most software performance issues are caused by a poor choice of algorithm or data structure rather than this kind of small overhead.
Your question starts with a wrong assumption, based on a misunderstanding of the cost incurring of using immutable objects.
Working with guaranteed immutable objects that are build form immutable objects allows you to use structural sharing, so you can create new objects based on the old ones without having to resort to a deep copy of the object and you can ,roughly spoken, reuse parts of the object the new on is based on.
So this mitigates the impact of using immutable objects greatly.
So what is the difference to fine-tuned, hand-crafted mutable objects ?
immutable objects fit better for the FP paradigma
compile time optimization and checks
lowers the chance of runtime exceptions
The question is very generic, so it is hard to give a definite answer. It seems that you are just uncomfortable with the amount of object allocation happening in idiomatic scala code using for comprehensions and the like.
The scala compiler does not do any special magic to fuse operations or to elide object allocations. It is up to the person writing the data structure to make sure that functional data structures reuse the as much as possible from previous versions (structural sharing). Many of the data structures used in scala collections do this reasonably well. See for example this talk about Functional Data Structures in Scala to give you a general idea.
If you are interested in the details, the book to get is Purely Functional Data Structures by Chris Okasaki. The material in this book applies also to other functional languages like Haskell and OCaml and Clojure.
The JVM is extremely good at allocating and collecting short-lived objects. So many things that seem outrageously inefficient to somebody accustomed to low level programming are actually surprisingly efficient. But there are definitely situations where mutable state has performance or other advantages. That is why scala does not forbid mutable state, but only has a preference towards immutability. If you find that you really need mutable state for performance reasons, it is usually a good idea to wrap your mutable state in an akka actor instead of trying to get low-level thread synchronization right.

Any performance gain/loss with having several function calls rather than a single large one?

I am currently making a game for the iPad and iPhone using cocos2d, Box2D and Objective-C.
A lot of stuff is happening every update, and a lot has to be resolved.
I recently refactored a lot of my code to several small methods, instead of having hundreds of lines of code inside the same method.
Is there any performance loss doing this?
Will fewer method calls increase performance?
Each function call results in a constant-time (O(1)) delay because of the stack frame adjustments and branching. However, you won't feel that delay unless the calls are made inside a time-critical loop a million times.
The best approach would be, I think, writing the cleanest code possible and then optimizing it -- with the help of a profiler -- as needed.
You may also want to check out this answer: https://stackoverflow.com/a/4816703/252687 Inline functions may reduce the aforementioned overhead a bit without compromising the modularity.
I have seen cases where multiple smaller functions resulted in significantly better-performing code, since the compiler was better able to optimize registers. Highly dependent on the compiler and style of programming, though.
But in general, on modern systems (other than really low-level microprocessors) optimizing performance at this level is counter-productive. Better to well-structure the code (which generally implies a fair number of subroutines) so that it's more reliable, easier to maintain, and easier to spot and fix more global performance issues.
Of course there is a performance decline with more method calls. However that is not a reason to use fewer, that would be pre-mature optimization at the expense of cleaner code.
Personally I go for the cleanest most clear code, let the compiler optimize and in the end profile for the real bottlenecks.
I was once hired on the basis of an answer to single question, that was I would profile before optimizing. :-)
After the compiler optimizes your code, you probably won't notice any reliable performance difference, unless you are trying to use method dispatches inside the inner loops of a CPU intensive computation routine, such as DSP or pixel level image processing.

Does Perl language aim at producing fast programs at runtime?

I recently had a friend tell me
"see perl was never designed to be fast"
Is that true?
The relevant piece of information I can find is this from Wikipedia:
The language is intended to be practical (easy to use, efficient, complete) rather than beautiful (tiny, elegant, minimal).
But it doesn't directly talk about speed. I think that with all the text processing that it needs to do, speed of execution really matters for a language like Perl. And with all the weird syntax, elegance was never an objective, I agree.
Was high speed of execution one of the design objectives of Perl?
There is one important aspect to be considered : algorithms. Perl secret weapons are the algorithms backing certain language features and the CPAN library.
Good algorithms trump raw execution speed for non trivial problems. It typically takes more effort to select and implement algorithms in C-like languages than in Perl. This means that for half a day coding some little tool the perl version often outperforms a C version because it was easier to make good datastructures with hashes and by using the features provided in the language and libraries.
Once a Perl script starts running (i.e. after loading and compiling everything), it can be very speedy. It's that yucky compile-every-time that's a bit nasty.
However, I find that people don't really have to worry about how fast Perl can be. They waste all of their time by implementing stupid designs that do a lot more work than they need to do, misunderstanding key technologies, or just being boneheaded. It's not uncommon for me to help someone make their stuff go orders of magnitude faster by just tuning in the right places. That's not particular to Perl though. People have that problem with every language.
Perl has always aimed toward practicality, not anything (even close to) some sort of ivory tower purity, where a few goals are given absolute priority, and others are ignored (completely or nearly so).
As such, I think it's reasonable to say that maintaining a reasonable speed of execution has always been seen as important for Perl, but there are other factors (especially things like flexibility and ease of use) that are generally more important, so if a choice has to be made between one of them and speed of execution, the other factor will generally win unless the effect on execution speed is really serious.
I would have said that a language that designed for optimal run time performance would not have constructs that allow compiling while running. So no, perhaps.
It became a design objective as of Perl 5.0. But keep in mind it is still interpreted, so it is fast for an interpreted language.

How to use the cachegrind output to optimize the application

I need to improve the throughput of the system.
The usual cycle of optimization has been done and we have already achieved 1.5X better throughput.
I am now beginning to wonder if I can utilize the cachegrind output to improve the system's throughput.
Can somebody point me to how to begin on this?
What I understand is we need to ensure most frequently used data should be kept small enough so that it remains in L1 cache and the next set of data should fit in the L2.
Is this the right direction I am taking?
It`s true that cachegrind output in itself does not give too much information how to go about optimizing code. One needs to know how to interpret it and what you are saying about data fitting into L1 and L2 is indeed the right direction.
To fully understand how memory access patterns influence performance, I recommend reading an excellent paper "What Every Programmer Should Know About Memory" by Ulrich Drepper, the GNU libc maintainer.
If you're having trouble parsing the cachegrind output, look into KCacheGrind (it should be available in your distro of choice). I use it and find it quite helpful.
According to the Cachegrind documentation, the details given to you by cachegrind are the number of cache misses for a given part of your code. You need to know about how caches work on the architecture you are targetting so that you know how to fix the code. In practice this means making data smaller or changing the access pattern of some data so that cached data is still in the cache. However you need to understand your program's data and data access before you can act on the information. As it says in the manual,
In short, Cachegrind can tell you where some of the bottlenecks in your code are, but it can't tell you how to fix them. You have to work that out for yourself. But at least you have the information!
1.5x is a nice speedup. It means you found something that took 33% of the time that you could get rid of. I bet you can do more, even before you get down to low-level issues like data memory cache. This is an example of how. Basically, you could have additional performance problems (and opportunities for speedup) that were not large before, like 25% say. Well, with the 1.5x speedup, that 25% is now 37.5%, so it is "worth more" than it was. Often such a problem is in the form of some mid-stack function call that is requesting work that, once you know how much it costs, you may decide isn't completely necessary. Since kcachegrind does not really pinpoint these, you may not realize it is a problem.

Do software metrics work both ways

I just started working for a large company. in a recent internal audit, measuring metrics such as Cyclomatic complexity and file sizes it turned out that several modules including the one owned by my team have a very high index. so in the last week we have been all concentrating on lowering these indexes for our code. by removing decision points and splitting files.
maybe I am missing something being the new guy but, how will this make our software better?, I know that software metrics can measure how good your code is, but dose it work the other way around? will our code become better just because for example we are making a 10000 lines file into 4 2500 lines files?
The purpose of metrics is to have more control over your project. They are not a goal on their own, but can help to increase the overall quality and/or to spot design disharmonies. Cyclomatic complexity is just one of them.
Test coverage is another one. It is however well-known that you can get high test coverage and still have a poor test suite, or the opposite, a great test suite that focus on one part of the code. The same happens for cyclomatic complexity. Consider the context of each metrics, and whether there is something to improve.
You should try to avoid accidental complexity, but if the processing has essential complexity, you code will anyway be more complicated. Try then to write mainteanble code with a fair balance between the number of methods and their size.
A great book to look at is "Object-oriented metrics in practice".
It depends how you define "better". Smaller files and less cyclomatic complexity generally makes it easier to maintain. Of course the code itself could still be wrong, and unit tests and other test methods will help with that. It's just a part of making code more maintainable.
Code is easier to understand and manage in smaller chunks.
It is a good idea to group related bits of code in their own functional areas for improved readability and cohesiveness.
Having a whole large program all in a single file will make your project very difficult to debug, extend, and maintain. I think this is quite obvious.
The particular metric is really only a rule of thumb and should not be followed religiously, but it may indicate something is not as nice as it could be.
Whether legacy working code should be touched and refactored is something that needs to be evaluated. If you decide to do so, you should consider writing tests for it first, that way you'll quickly know whether your changes broke any required behavior.
Never ever opened one of your own projects after several months again? The larger and more complex the single components are the more one asks oneself, what genious wrote that code and why the heck he wrote it that way.
And, there's never too much or even enough documentation. So if the components themself are lesser complex and smaller, its easier to re-understand 'em
This is bit Subjective. The idea of assigning a maximim Cyclomatic complexity index is to improve the maintainability and the readability of the code.
As an example in the perspective of the unit testing, it is really convenient to have smaller "units". And avoiding the long codes will help the reader to understand the code. You cannot ensure that the original developer works on the code forever so in the company's perspective it is fair to assign such a criteria to keep the code "simple"
It is easy to write a code that can undertand by a computer. It is more harder to write a code that can understood by a human.
how will this make our software better?
Excerpt from the articles Fighting Fabricated Complexity related to the tool for .NET developers NDepend. NDepend is good at helping team to manage large and complex code base. The idea is that code metrics are good are reducing fabricated complexity in the code implementation:
During my interview on Code Metrics by Scott Hanselman’s on Software Metrics, Scott had a particularly relevant remark.
Basically, while I was explaining that long and complex methods are killing quality and should be split into smaller methods, Scott asked me:
looking at this big too complicated
method and I break it up into smaller
methods, the complexity of the
business problem is still there,
looking at my application I can say,
this is no longer complex from the
method perspective, but the software
itself, the way it is coupled with
other bits of code, may indicate other
problem…
Software complexity is a subjective measure relative to the human cognition capacity. Something is complex when it requires effort to be understood by a human. The fact is that software complexity is a 2 dimensional measure. To understand a piece of code one must understand both:
what this piece of code is supposed to do at run-time, the behavior of the code, this is the business problem complexity
how the actual implementation does achieve the business problem, what was the developer mental state while she wrote the code, this is the implementation complexity.
Business problem complexity lies into the specification of the program and reducing it means working on the behavior of the code itself. On the other hand, we are talking of fabricated complexity when it comes to the complexity of the implementation: it is fabricated in the sense that it can be reduced without altering the behavior of the code.
how will this make our software better?
It can be a trigger for a refactoring, but following one metric doesn't guarantee that all other quality metrics stay the same. And tools are only able to follow very few metrics. You can't measure to which degree code is understandable.
Will our code become better just
because for example we are making a
10 000 lines file into 4 2500 lines
files?
Not necessarily. Sometimes the larger one can be more understandable, better structured and have lesser bugs.
Most design patterns for example "improve" your code by making it more general and maintenable, but often with the cost of added source lines.