What are trade-offs of different values for Code metrics in a .NET application - code-metrics

I have a situation which demands the following code metrics.
Maintainability index > 80 &
Cyclomatic complexity < 20
But as per MSDN (http://msdn.microsoft.com/en-us/library/cc667398(v=VS.90).aspx).
Maintainability index > 20 is green
Cyclomatic complexity < 25 is
good.
I need to understand the following:
Is it worth investing time to improve maintainability Index from 20 to 80?
If we don't achieve a maintainability index > 80, will there be an increase in maintenance cost?
What is the impact of keeping Cyclomatic complexity < 25?
Thanks in advance.

Requirements may vary depending of the application role. MSDN gives you a general rule of thumb, for buisness or mission critical applications requirements can be higher.
In general, client, the one who pays for development, knows best about the application role, so it is up to her to decide, whether it worth investing in mantainability or not. There is no certain answer, each case should be evaluated separately.

The answer to (1) is "yes", if the situation "demands" it.
The answer to (2) varies. A higher maintainability index generally corresponds to lower maintenance costs, however, the code that is predicted to be difficult to maintain may work perfectly and may never require maintenance. And even if maintenance is suggested, it is up to the maintainers to decide how much they are willing to spend to maintain it.
As for (3), there is not much difference between cyclomatic complexities of 20 and 25. Both are higher than the recommended numbers I typically have seen. The higher numbers indicate more conditionals, and the more conditions there are, the harder it is to figure out what the code is doing and what needs to change.

Related

Do cats and scalaz create performance overhead on application?

I know it is totally a nonsense question but due to my illiteracy on programming skill this question came to my mind.
Cats and scalaz are used so that we can code in Scala similar to Haskell/in pure functional programming way. But for achieving this we need to add those libraries additionally with our projects. Eventually for using these we need to wrap our codes with their objects and functions. It is something adding extra codes and dependencies.
I don't know whether these create larger objects in memory.
These is making me think about. So my question: will I face any performance issue like more memory consumption if I use cats/scalaz ?
Or should I avoid these if my application needs performance?
Do cats and scalaz create performance overhead on application?
Absolutely.
The same way any line of code adds performance overhead.
So, if that is your concern, then don't write any code (well, actually the world may be simpler if we would have never tried all this).
Now, dick answer outside. The proper question you should be asking is: "Does the overhead of X library is harmful to my software?"; remember this applies to any library, actually to any code you write, to any algorithm you pick, etc.
And, in order to answer that question, we need some things before.
Define the SLAs the software you are writing must hold. Without those, any performance question / observation you made is pointless. It doesn't matter if something is faster / slower if you don't know if that is meaningful for you and your clients.
Once you have SLAs you need to perform stress tests to verify if your current version of the software satisfies those. Because, if your current code is performant enough, then you should worry about other things like maintainability, testing, adding more features, etc.
PS: Remember that those SLAs should not be raw numbers but be expressed in terms of percentiles, the same goes for the results of the tests.
When you found that you are falling your SLAs then you need to do proper benchmarking and debugging to identify the bottlenecks of your project. As you saw, caring about performance must be done on each line of code, but that is a lot of work that usually doesn't produce any relevant output. Thus, instead of evaluating the performance of everything, we find the bottlenecks first, those small pieces of code that have the biggest contributions to the overall performance of your software (remember the Pareto principle).
Remember that in this step, we have to be integral, network matters too. (and you will see this last one is usually the biggest slowdown; thus, usually you would rather search for architectural solutions like using Fibers instead of Threads rather than trying to optimize small functions. Also, sometimes the easier and cheaper solution is better infrastructure).
When you find the bottleneck, then you need to formulate some alternatives, implement those and not only benchmark them but do Statistical hypothesis testing to validate if the proposed changes are worth it or not. And, of course, validate if they were enough to satisfy the SLAs.
Thus, as you can see, performance is an art and a lot of work. So, unless you are committed to doing all this then stop worrying about something you will not measure and optimize properly.
Rather, focus on increasing the maintainability of your code. This actually also helps performance, because when you find that you need to change something you would be grateful that the code is as clean as possible and that the whole architecture of the code allows for an easy change.
And, believe me when I say that, using tools like cats, cats-effect, fs2, etc will help with that regard. Also, they actually pretty optimized on their core so you should be good for a lot of use cases.
Now, the big exception is that if you know that the work you are doing will be very CPU and memory bound then yeah, you pretty much can be sure all those abstractions will be harmful. In those cases, you may even want to stay away from the JVM and rather write pretty low-level code in a language like Rust which will provide you with proper tools for that kind of problem and still be way safer than plain old C.

What is the obvious advantage of using AMPL?

I am doing a project using CPLEX solver, on Netbeans with Java. We have several optimization problems to solve, I have already solved one of them by coding in Java all the constraints, objective and variables, without using AMPL. However, some people in my team want to use AMPL.
Thus, as I don't want to read all the AMPL book to find the answer, is there an obvious reason to rather use AMPL than coding all the constraints "manually"? Moreover, can AMPL be integrated in Netbeans ? I did not find any documentation about that.
Is AMPL useful when the constraints need to be "flexible" (I mean, we can't guess in advance the exact number of constraints, it depends on the parameters fixed by the user, modularity is a high importance factor...)
I am really curious to hear about that soon !
Thanks for help
AMPL is an algebraic modeling language and quoting from that link:
One advantage of AMPL is the similarity of its syntax to the
mathematical notation of optimization problems.
For example, this can allow you to define groups of constraints without knowing in advance the dimensions of the model. And, perhaps, you can make big changes to your model more quickly. (You'll have to think about how often you will actually do that.)
However, one could argue that the "obvious advantage" of AMPL is that it supports dozens of different solvers. You can create your model and solve it with CPLEX, but then decide that you want to use a different solver (e.g., Gurobi, Xpress, etc.). On the AMPL Solvers web page, they have the following recommendation:
We recommend that you then test alternative solvers to determine which
offers the best tradeoff of price and performance for your needs.
The AMPL API web page says that there is a Java API, so that should allow you to include it in a Netbeans project, but I have no experience with that.
At the end of the day, you could also argue that these "advantages" are a matter of taste. Using the CPLEX Java API directly, as you have already done, is certainly a valid solution if it meets your requirements. It may allow you to build the model more efficiently, use solver-specific/advanced features that might not be supported by AMPL, and to have more fine-grained control over the model formulation.
You have just coded an optimisation model to optimise your company's production of widgets. Your company got a really good deal on $SOLVER1 so that's what you're using.
Over the next ten years, you improve and extend that model as your bosses throw new requirements at you. By the end of that time, you may have tens of thousands of lines of optimisation code as part of a system that, by now, is absolutely critical to your company's operations.
Your company's original licensing deal has expired, and the manufacturers of $SOLVER1 have massively increased the licensing fees, so you're now paying hundreds of thousands a year in licensing costs.
Meanwhile, the boffins at a rival company have just released a new version of $SOLVER2. It has fancy new algorithms that could solve the widget optimisation problem 20% faster and find better solutions than $SOLVER1 is giving you. It doesn't cost any more than $SOLVER1 and the performance is better.
Meanwhile, the open-source community has released $FREESOLVER. It might not be quite as powerful as the top commercial options, but it's as good as $SOLVER1 was ten years ago, and if you weren't paying $100k/year for licensing you could rent an awful lot of server time to make up for it.
...so, did you write your optimisation model on a platform that lets you switch to a new solver and take advantage of these opportunities without having to jettison ten years' worth of code?
There are huge advantages to being able to switch solvers quickly and easily. I know of one company who uses three different solvers for their work: they try two different open-source solvers both running in the cloud, and if neither of those can find an adequate solution then they throw it to an expensive solver with smarter algorithms. The open-source solvers handle 90% of their problems, so they only have to use the commercial solver for the last 10%, which allows them to make significant savings on their licensing costs.
One option we've discussed at my work is to use a commercial solver for mission-critical work, and open-source alternatives for applications like training or small-scale prototyping where we don't have the same requirements. That way we can minimise the number of concurrent users we need to license for the commercial solver.
(And, yes, there is still an issue of lock-in with the platform, but platforms like AMPL are significantly cheaper than a high-end commercial solver.)
Totally agree with everything that rkersh says. Also note that you should never write your model in a way that hard-codes details of your problem sizes etc. whether you write in an algebraic modelling language or through one of the more direct APIs.
Also, working with a modelling language gives you an extra level/layer of abstraction which can help, especially in sharing or explaining your model to others, comparing with a range of standard problem types etc., but I prefer the more nuts-and-bolts 'feel' of working with the more direct APIs, and almost never need (or have time & budget) to reformulate my models that deeply.
Even GPL means "general" yet newer and newer GPLs coming to life, so a given GPL is "more general" to somet tasks than others... :-) In theory writing a compiler the most efficiently for Pascal or Perl should not matter, so in fact you could write in whatever language you want and yet you should not lose expressivity or efficiency (e.g. for C# which is in the same league for Java now, MS writes a better compiler than the opensource equivalent).
Humans are specializing - this is why we have gotten this far :-) . No different when it comes to achieve a given task to convert a business problem to a math model (aka modeling). The whole idea of having a given modeling layer is that
A. you have the outmost expressivity for that particular task (aka math modeling)
B. it enforces some best practicies for modeling what in GPL you are not "forced" to do (1. you are free to do 2. it is marketed to you as such = flexibility). E.g. AMPL, GAMS, others are mixing declarative code (aka model code) and procedural code (aka flow-control-like) which is not a good practice. On the other hand e.g. separating data and an abstract model is getting to ALL modeling languages but interestingly enough very slowly...
C. thru no.A you can maintain the code more efficiently than otherwise (contrary to API modeling - I have clients who say they turned to modelinglanguage becuase API modeling is a liability for rapid model revamp)
D. in theory you could be solver independent.
If you look around all modeling languages are trying to maintain no.C except OPL (that's for historical reasons). But even in case of OPL, you get constraint-programming and constraint-based scheduling (beside math-programming) what with AMPL/GAMS you don't, however solverindependent they are...
the $Solver1 and $Solver2 + $Freesolver comparison is a bit broken for 4 reasons
A. opensolvers are still very far away from commercial solvers in term of performance when it comes to large/complex problems (probably LP is getting to the exception) - I have clients - the fastest ever sales in my memory - when they tested commercial solvers after their "free-ride".
B. while indeed the scenario described in relation with $Solver1 and $Solver2 seems plausible ($Solver1, the incumbent is getting more expensive over time), we could witness just the other way around where the $Solver2 (a new comer) actualy increased its pricing 4x in 7 years and in some cases doubled it, while $Solver1 (the incumbent) has had no change.
C. mixing up modeling capabilities and solvers is a mistake. The whole idea is that somebody writes models in APIs IS the way to stick to a solver much more than thru modeling languages. At a minimum, as the Hungarians say "what you gain on the custom you lose it on the ferry", in other words, "freedom (i.e. flexibility) comes with using it responsibly"
D. owning a solver for development is NOT expensive at all, i.e. a company can maintain large # of solvers (for less than 10k$ a company could have +4 solvers for development) to test which is the fastest for any given model and then choose the best suited for deployment.
in addition, solver is just one piece of the puzzle. E.g. I have a client who has disparate data sources and it takes 8hours to create a model and 4hours to solve it. Would this client welcome a more efficient data handling suite or would it insist that the solver should be faster? Modelers are too isolated from the business in most cases and while in their mind a given model is perfect, how it is populated by data is secondary, yet it makes or breaks a good performance.
I witness that API modelers are moving to modeling languages, not the other way around for various reasons...
but as somebody wrote above, there are lots of "tastes in the game", so eventually if you feel more confortable with a given approach then nobody can blame you to choose so... :-) after all it is very difficult to compare the/an other approach since it's almost never there on a given case... so eventually what counts is speed from business problem to a model which solve fast in the given application context :-)
phew, it was long... but I gave all my shots... :-)
To keep it short to illustrate advantage/disadvantage of using AMPL just compare using Java(AMPL) instead of assembly language(CPLEX).

NoSQL and eventual consistency - real world examples [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 3 years ago.
Improve this question
I'm looking for good examples of NoSQL apps that portray how to work with lack of transactionality as we know it in relational databases. I'm mostly interested in write-intensive code, as for mostly read-only code this is a much easier task. I've read a number of things about NoSQL in general, about CAP theorem, eventual consistency etc. However those things tend to concentrate on the database architecture for its own sake and not on the design patterns to use with it. I do understand that it's impossible to achieve full transactionality within a distributed app. This is exactly why I would like to understand where and how requirements should be lowered in order to make the task feasable.
EDIT:
It's not that eventual consistency is my goal on it's own. For the time being I don't really see how to use NoSQL to certain things that are write-intensive. Say: I have a simplistic auction system, where there are offers. In theory the first person to accept an offer wins. In practice I would like at least to guarantee that there is only a single winner and that people get their results in the same request. It's probably not feasable. But how to solve it in practice - maybe some requests could take longer than usual, because something went wrong. Maybe some requests should be automatically refreshed. It's just an example.
Let me explain CAP in purely intuitive terms. First, what C, A and P mean:
Consistency: From the standpoint of an external observer, each
"transaction" either fully completed or is fully rolled back. For example,
when making an amazon purchase the purchase confirmation, order status
update, inventory reduction etc should all appear 'in sync'
regardless of the internal partitioning into sub-systems
Availablility: 100% of requests are completed successfully.
Partition Tolerance: Any given request can be completed even if a
subset of nodes in the system are unavailable.
What do these imply from a system design standpoint? what is the tension which CAP defines?
To achieve P, we needs replicas. Lots of em! The more replicas we keep, the better the chances are that any piece of data we need will be available even if some nodes are offline. For absolute "P" we should replicate every single data item to every node in the system. (Obviously in real life we compromise on 2, 3, etc)
To achieve A, we need no single point of failure. That means that "primary/secondary" or "master/slave" replication configurations go out the window since the master/primary is a single point of failure. We need to go with multiple master configurations. To achieve absolute "A", any single replica must be able to handle reads and writes independently of the other replicas. (in reality we compromise on async, queue based, quorums, etc)
To achieve C, we need a "single version of truth" in the system. Meaning that if I write to node A and then immediately read back from node B, node B should return the up-to-date value. Obviously this can't happen in a truly distributed multi-master system.
So, what is the solution to your question? Probably to loosen up some of the constraints, and to compromise on the others.
For example, to achieve a "full write consistency" guarantee in a system with n replicas, the # of reads + the # of writes must be greater or equal to n : r + w >= n. This is easy to explain with an example: if I store each item on 3 replicas, then I have a few options to guarantee consistency:
A) I can write the item to all 3 replicas and then read from any one of the 3 and be confident I'm getting the latest version
B) I can write item to one of the replicas, and then read all 3 replicas and choose the last of the 3 results
C) I can write to 2 out of the 3 replicas, and read from 2 out of the 3 replicas, and I am guaranteed that I'll have the latest version on one of them.
Of course, the rule above assumes that no nodes have gone down in the meantime. To ensure P + C you will need to be even more paranoid...
There are also a near-infinite number of 'implementation' hacks - for example the storage layer might fail the call if it can't write to a minimal quorum, but might continue to propagate the updates to additional nodes even after returning success. Or, it might loosen the semantic guarantees and push the responsibility of merging versioning conflicts up to the business layer (this is what Amazon's Dynamo did).
Different subsets of data can have different guarantees (ie single point of failure might be OK for critical data, or it might be OK to block on your write request until the minimal # of write replicas have successfully written the new version)
There is more to talk about, but let me know if this was helpful and if you have any followup questions, we can continue from there...
[Continued...]
The patterns for solving the 90% case already exist, but each NoSQL solution applies them in different configurations. The patterns are things like partitioning (stable/hash-based or variable/lookup-based), redundancy and replication, in memory-caches, distributed algorithms such as map/reduce.
When you drill down into those patterns, the underlying algorithms are also fairly universal: version vectors, merckle trees, DHTs, gossip protocols, etc.
The same can be said for most SQL solutions: they all implement indexes (which use b-trees under the hood), have relatively smart query optimizers which are based on known CS algorithms, all use in-memory caching to reduce disk IO. The differences are mostly in implementation, management experience, toolset support, etc
unfortunately I can't point to some central repository of wisdom which contains all you will need to know. In general, start with asking yourself what NoSQL characteristics you really need. That will guide you to choosing between a key-value store, a document store or a column store. (those are the 3 main categories of NoSQL offerings). And from there you can start comparing the various implementations.
[Updated again 4/14/2011]
OK here's the part which actually justifies the bounty..
I just found the following 120 page whitepaper on NoSQL systems. This is very close to being the "NoSQL bible" which I told you earlier doesn't exist. Read it and rejoice :-)
NoSQL Databases, Christof Strauch
There are many applications where eventual consistency is fine. Consider Twitter as a rather famous example. There's no reason that your "tweets" have to go out to all of your "followers" instantaneously. If it takes several seconds (or even minutes?) for your "tweet" to be distributed, who would even notice?
If you want non-web examples, any store-and-forward service (like email and USENET) would be require eventual consistency.
It's not impossible to get transactions or consistency in NoSQL. A lot of people define NoSQL in terms of a lack of transactions or as requiring eventual consistency at best, but this isn't accurate. There are transactional nosql products out there - consider tuple spaces, for example - that scale very well even while providing app consistency.

Decision table with large number of conditions and actions

If the number of conditions and actions is high (in my case, 12 conditions and 13 actions respectively!), making/maintaining a decision table with hand is proving to be really tough. The number of possible rules in the case at hand is huge (Y/N for 11 conditions and a 3-way choice for the 12th) and it's freaking me out. Also, these conditions and actions cannot be collapsed/coalesced; they are all needed very much.
What could be a better alternative to a decision table? What are some popular free tools to model the same?
Thanks so much.
Have a look at ROBDDs.
Statestep might be what you're looking for. Really powerful for dealing with large numbers of possibilities. Not normally free though, unless for education etc.

Do software metrics work both ways

I just started working for a large company. in a recent internal audit, measuring metrics such as Cyclomatic complexity and file sizes it turned out that several modules including the one owned by my team have a very high index. so in the last week we have been all concentrating on lowering these indexes for our code. by removing decision points and splitting files.
maybe I am missing something being the new guy but, how will this make our software better?, I know that software metrics can measure how good your code is, but dose it work the other way around? will our code become better just because for example we are making a 10000 lines file into 4 2500 lines files?
The purpose of metrics is to have more control over your project. They are not a goal on their own, but can help to increase the overall quality and/or to spot design disharmonies. Cyclomatic complexity is just one of them.
Test coverage is another one. It is however well-known that you can get high test coverage and still have a poor test suite, or the opposite, a great test suite that focus on one part of the code. The same happens for cyclomatic complexity. Consider the context of each metrics, and whether there is something to improve.
You should try to avoid accidental complexity, but if the processing has essential complexity, you code will anyway be more complicated. Try then to write mainteanble code with a fair balance between the number of methods and their size.
A great book to look at is "Object-oriented metrics in practice".
It depends how you define "better". Smaller files and less cyclomatic complexity generally makes it easier to maintain. Of course the code itself could still be wrong, and unit tests and other test methods will help with that. It's just a part of making code more maintainable.
Code is easier to understand and manage in smaller chunks.
It is a good idea to group related bits of code in their own functional areas for improved readability and cohesiveness.
Having a whole large program all in a single file will make your project very difficult to debug, extend, and maintain. I think this is quite obvious.
The particular metric is really only a rule of thumb and should not be followed religiously, but it may indicate something is not as nice as it could be.
Whether legacy working code should be touched and refactored is something that needs to be evaluated. If you decide to do so, you should consider writing tests for it first, that way you'll quickly know whether your changes broke any required behavior.
Never ever opened one of your own projects after several months again? The larger and more complex the single components are the more one asks oneself, what genious wrote that code and why the heck he wrote it that way.
And, there's never too much or even enough documentation. So if the components themself are lesser complex and smaller, its easier to re-understand 'em
This is bit Subjective. The idea of assigning a maximim Cyclomatic complexity index is to improve the maintainability and the readability of the code.
As an example in the perspective of the unit testing, it is really convenient to have smaller "units". And avoiding the long codes will help the reader to understand the code. You cannot ensure that the original developer works on the code forever so in the company's perspective it is fair to assign such a criteria to keep the code "simple"
It is easy to write a code that can undertand by a computer. It is more harder to write a code that can understood by a human.
how will this make our software better?
Excerpt from the articles Fighting Fabricated Complexity related to the tool for .NET developers NDepend. NDepend is good at helping team to manage large and complex code base. The idea is that code metrics are good are reducing fabricated complexity in the code implementation:
During my interview on Code Metrics by Scott Hanselman’s on Software Metrics, Scott had a particularly relevant remark.
Basically, while I was explaining that long and complex methods are killing quality and should be split into smaller methods, Scott asked me:
looking at this big too complicated
method and I break it up into smaller
methods, the complexity of the
business problem is still there,
looking at my application I can say,
this is no longer complex from the
method perspective, but the software
itself, the way it is coupled with
other bits of code, may indicate other
problem…
Software complexity is a subjective measure relative to the human cognition capacity. Something is complex when it requires effort to be understood by a human. The fact is that software complexity is a 2 dimensional measure. To understand a piece of code one must understand both:
what this piece of code is supposed to do at run-time, the behavior of the code, this is the business problem complexity
how the actual implementation does achieve the business problem, what was the developer mental state while she wrote the code, this is the implementation complexity.
Business problem complexity lies into the specification of the program and reducing it means working on the behavior of the code itself. On the other hand, we are talking of fabricated complexity when it comes to the complexity of the implementation: it is fabricated in the sense that it can be reduced without altering the behavior of the code.
how will this make our software better?
It can be a trigger for a refactoring, but following one metric doesn't guarantee that all other quality metrics stay the same. And tools are only able to follow very few metrics. You can't measure to which degree code is understandable.
Will our code become better just
because for example we are making a
10 000 lines file into 4 2500 lines
files?
Not necessarily. Sometimes the larger one can be more understandable, better structured and have lesser bugs.
Most design patterns for example "improve" your code by making it more general and maintenable, but often with the cost of added source lines.