NDepend: What is "Resilience to Change" - code-metrics

Whilst evaluating NDepend I attached a NDepend project to all the Visual Studio projects that comprise our company's software suite. Particularly striking is the result of the Abstractness vs Instability graph. Almost all of the projects are crammed into the bottom-right corner of the graph, indicating a very high level of "instability".
The NDepend documentation's definition of instability is:
The ratio of efferent coupling (Ce) to total coupling. I = Ce / (Ce + Ca). This metric is an indicator of the assembly's resilience to change. The range for this metric is 0 to 1, with I=0 indicating a completely stable assembly and I=1 indicating a completely instable assembly.
However, I have been unable to find a clear definition of "resilience to change" in this context. Would anyone like to try to produce a definition?
Added
Obviously, the sentence in which "resilience to change" appears gives a loose definition of this concept as the "ratio of efferent coupling (Ce) to total coupling". But that leaves open the question of what the significance of this ratio is and how it relates to change.

See the documentation in report. stable means painful to modify hence instable is a positive things, it means it can be changed with few pain, it means the assembly is resilient to changes.
Excerpt from the documentation in the report: Abstractness versus Instability Diagram
The Abstractness versus Instability Diagram helps to detect which assemblies are potentially painful to maintain (i.e concrete and stable) and which assemblies are potentially useless (i.e abstract and instable).
Abstractness: If an assembly contains many abstract types (i.e interfaces and abstract classes) and few concrete types, it is considered as abstract.
Stability: An assembly is considered stable if its types are used by a lot of types of tier assemblies. In this conditions stable means painful to modify.

Related

In multi-stage compilation, should we use a standard serialisation method to ship objects through stages?

This question is formulated in Scala 3/Dotty but should be generalised to any language NOT in MetaML family.
The Scala 3 macro tutorial:
https://docs.scala-lang.org/scala3/reference/metaprogramming/macros.html
Starts with the The Phase Consistency Principle, which explicitly stated that free variables defined in a compilation stage CANNOT be used by the next stage, because its binding object cannot be persisted to a different compiler process:
... Hence, the result of the program will need to persist the program state itself as one of its parts. We don’t want to do this, hence this situation should be made illegal
This should be considered a solved problem given that many distributed computing frameworks demands the similar capability to persist objects across multiple computers, the most common kind of solution (as observed in Apache Spark) uses standard serialisation/pickling to create snapshots of the binded objects (Java standard serialization, twitter Kryo/Chill) which can be saved on disk/off-heap memory or send over the network.
The tutorial itself also suggested the possibility twice:
One difference is that MetaML does not have an equivalent of the PCP - quoted code in MetaML can access variables in its immediately enclosing environment, with some restrictions and caveats since such accesses involve serialization. However, this does not constitute a fundamental gain in expressiveness.
In the end, ToExpr resembles very much a serialization framework
Instead, Both Scala 2 & Scala 3 (and their respective ecosystem) largely ignores these out-of-the-box solutions, and only provide default methods for primitive types (Liftable in scala2, ToExpr in scala3). In addition, existing libraries that use macro relies heavily on manual definition of quasiquotes/quotes for this trivial task, making source much longer and harder to maintain, while not making anything faster (as JVM object serialisation is an highly-optimised language component)
What's the cause of this status quo? How do we improve it?

How to show a currently unknown number of libraries in a Deployment Diagram

My system is still at the concept stage, but I'm creating a Deployment Diagram to focus discussions about the end result. The system will have some run-time subroutine libraries. At at this stage there is no way to know how many there will be: in fact the number may never stabilize, since it will depend on various other aspects of the system.
A library is an artifact with stereotype "library" in guillemets. Several ways to diagram the unknown number come to mind:
Show one library, with a note attached saying that the number of libraries is unknown
Put an asterisk at the upper right of the library artifact, as with composite structure diagrams
Draw a slanted stack of overlapping artifacts
Don't diagram them individually: just list "library 1, library 2, library 3..."
Use a "manifest" relationship a between an artifact and an asterisked-component.
And there must be lots of other creative possibilities.
Does anyone know if there's a UML-prescribed way to show this unknown number of libraries on a Deployment Diagram, and if so, what it is?
I think you need to separate your logical Deployment Model (abstract) that shows classes of Artefacts, Components, Nodes etc. from your physical Deployment Model(s) (concrete) that show specific instances of those Artefacts, Components and Nodes etc. At the logical level, you could show an abstract Artefact called "Library" that has associations with plural multiplicities to Nodes, Components etc. in your logical design. Then, you could specialise that abstract Artefact, and at the concrete Deployment Model(s) level, create instances of these Library specialisations that are associated with instances of Nodes, Components etc.
Something like:

HIS-Metric "calling"

I do not understand the reason for this metric/rule:
A function should not be called from more than 5 different functions.
All calls within the same function are counted as 1. The rule is
limited to translation unit scope.
It appears to me completely intuitive, because this contradicts code reuse and the approach of split code into often used functions instead of duplicated code.
Can someone explain the rationale?
The first thing to say is that Metric-based quality approaches are by their nature a little subjective and approximate. There are no absolutes in following a metric approach to delivering good quality code.
There are two factors to consider in software complexity. One is the internal complexity, expressed by decision complexity within each function (best exemplified by the Cyclomatic Complexity measure) and dependency complexity between functions within the container (Translation Unit or Class). The other is interface complexity, measuring the level of dependency, including cyclic ones, between collaborating and hierarchical components or classes. In the C/C++ world, this is across multiple TUs. In Structure101 terms, the internal form of complexity is called “Fat” and the external form called “Tangles”.
Back to your question, this Hersteller Initiative Software ‘CALLING’ metric is targeting internal complexity (Fat). Their argument appears to be that if you have more than 5 points of reference to a single function, there may be too much implementation logic in that C++ class or C implementation file, and therefore perhaps time to break into separate modules or components. It seems like a peculiarly stinted view of software design and structure, and the list of exceptions may be as long as the areas where such a judgement might apply.

Analysing and generating statistics on your code

I was wondering if anyone had any ideas or procedures for generating general statistics on your source code.
Off the top of my head I would love to know how many functions in my project's code are called once or very few times or any classes that are only instantiated once.
I'm sure there is a ton of other interesting things to be found out.
I could do something like the above using grep magic but has anyone come across tools or tips?
Coverity is the first thing coming to mind. It currently offers (on one of their products)
Software DNA Map™ analysis system: Generates a comprehensive representation of the entire build system including a semantically correct parsing of every line of code.
Defect Manager: Intuitive interface makes it easy to establish ownership of defects and resolve them via a customized workflow that mirrors your existing development process.
Local Analysis: Enables code to be analyzed locally on developers’ desktops to ensure quality before sharing with other developers.
Boolean Satisfiability: Translates the code into questions based on Boolean values, then applies SAT solvers for the most accurate defect detection and the lowest false positive rate available. Only Prevent offers the added precision of this proprietary method.
Race Conditions Checker: Features an industry-first race conditions checker built specifically for today’s complex multi-threaded applications.
Path Simulation: Simulates 100% of all values and data paths, enabling detection of the most critical defects.
Statistical & Interprocedural Analysis: Ensures a comprehensive analysis of your entire build system by inferring correct behavior based on previously observed behavior and performing whole-program analysis similar to the executing Bin.
False Path Pruning: Efficiently removes false positives to give Prevent an average FP rate of about 15%, with some users reporting FP rates of as low as 5%.
Incremental Analysis: Analyzes source code wholly or incrementally, allowing you to save time by checking only those components that are affected by a change.
Reporting: Measures software quality trends over time via customizable reporting so you can show defects grouped by checker, classification, component, and other defect information.
There are lots of tools that do this. But afaik none of them are language independent (which in turn would be mostly impossible e.g. some languages might not even have functions).
Generally you will find those tools under the categories of "code coverage tools" or "profilers".
For .Net you can use Visual Studio or Clrprofiler.

Is there a standard definition of what constitutes a version(revision) change

I am currently in bureaucratic hell at my company and need to define what constitutes the different levels of software change to our test programs. We have a rough practice that we follow internally, but I am looking for a standard (if it exists) to reference in our Quality system. I recognize that systems may vary greatly between developers, but ultimately I am looking for a "best practice" guide to what constitutes a major change, a minor change etc. I would like to reference a published doc in my submission to our quality system for ISO purposes if possible.
To clarify the software developed at my company is used internally for test automation of Semi-Conductors. We are not selling this code and versioning is really for record keeping only. We are using the x.y.z changes to effect the level of sign-off and approval needed for release.
A good practice is to use 3 level revision numbers:
x.y.z
x is the major
y is the minor
z are bug fixes
The important thing is that two different software versions with the same x should have binary compatibility. A software version with a y greater than another, but the same x may add features, but not remove any. This ensures portability within the same major number. And finally z should not change any functional behavior except for bug fixes.
Edit:
Here are some links to used revision-number schemes:
http://en.wikipedia.org/wiki/Software_versioning
http://apr.apache.org/versioning.html
http://www.advogato.org/article/40.html
I would add build number to the x.y.z format:
x.y.z.build
x = major feature change
y = minor feature change
z = bug fixes only
build = incremented every time the code is compiled
Including the build number is crucial for internal purposes where people are trying to figure out whether or not a particular change is in the binaries that they have.
to enlarge on what #lewap said, use
x.y.z
where z level changes are almost entirely bug fixes that don't change interfaces or involve external systems
where y level changes add functionality and may change the UI/API interface in addition to fixing more serious bugs that may involve external systems
where x level changes involve anything from a complete rewrite/redesign to just changing the database structures to changing databases (i.e. from Oracle to SQLServer) - in other words anything that isn't a drop in change that requires a "port" or "conversion" process
I think it might differ if you are working on an internal software vs a external software (a product).
For internal software it will almost never be a problem to use a formally defined scheme. However for a product the version or release number is in most cases a commercial decision that does not reflect any technical or functional criteria.
In our company the x.y in an x.y.z numbering scheme is determined by the marketing boys and girls. The z and the internal build number are determined by the R&D department and track back into our revision control system and are related to the sprint in which it was produced (sprint is the Scrum term for an iteration).
In addition formally defining some level of compatability between versions and releases could cause you to very rapidly move up or to hardly move at all. This might not reflect added functionality.
I think that best approach for you to take hen explaining this to your co-workers is by examples drawn from well known and succesful software packages, and the way they approach major and minor releases.
The first thing I would say is that the major.minor dot notation for releases is a relatively recent invention. For example, most releases of UNIX actually had names (which sometimes included a meaningless number) rather than version numbers.
But assuming you want to use major.minor, numbering, then the major number indicates a version that is basically incompatible with much that went before. Consider the change from Windows 2,0 to 3.0 - most 2,0 applications simply didn't fit in with the new overlapped windows in Windows 3,0. For less all-encompassing apps, a radical change in file formats (for example) could be a reason for a major version change - WP &n graphic apps often work this way.
The other reason for a major version number change is that the user notices a difference. Once again this was true for the change from Windows 2.0 to 3.0 and was responsible forv the latters success. If your app looks very different, that;s a major change.
A for the minor version number, this is typically used to indicate a chanhe that actually is quite major, but that won't be noticeable to the user. For example, the differences internally between Win 3.0 and Win 3.1 were actually quite major, but the interface stayed the same.
Regarding the third version number, well few people know hat it really means and fewer care. For example, in my everyday work I use the GNU C++ compiler version 3.4.5 - how does this differ from 3.4.4 - I haven't a clue!
As I said before in an answer to a similar question: The terminology used is not very precise. There is an article describing the five relevant dimensions. Data management tools for software development don't tend to support more than three of them consistently at the same time. If you want to support all five you have to describe a development proces:
Version (semantics: modification)
View (semantics: equivalence, derivation)
Hierarchy (semantics: consists of)
Status (semantics: approval, accessibility)
Variant (semantics: product variations)
Peter van den Hamer and Kees Lepoeter (1996) Managing Design Data: The Five Dimensions of CAD Frameworks, Configuration Management, and Product Data Management, Proceedings of the IEEE, Vol. 84, No. 1, January 1996