How do you define 'unwanted code'? - code-metrics

How would you define "unwanted code"?
Edit:
IMHO, Any code member with 0 active calling members (checked recursively) is unwanted code. (functions, methods, properties, variables are members)

Here's my definition of unwanted code:
A code that does not execute is a dead weight. (Unless it's a [malicious] payload for your actual code, but that's another story :-))
A code that repeats multiple times is increasing the cost of the product.
A code that cannot be regression tested is increasing the cost of the product as well.
You can either remove such code or refactor it, but you don't want to keep it as it is around.

0 active calls and no possibility of use in near future. And I prefer to never comment out anything in case I need for it later since I use SVN (source control).

Like you said in the other thread, code that is not used anywhere at all is pretty much unwanted. As for how to find it I'd suggest FindBugs or CheckStyle if you were using Java, for example, since these tools check to see if a function is used anywhere and marks it as non-used if it isn't. Very nice for getting rid of unnecessary weight.

Well after shortly thinking about it I came up with these three points:
it can be code that should be refactored
it can be code that is not called any more (leftovers from earlier versions)
it can be code that does not apply to your style-guide and way-of-coding
I bet there is a lot more but, that's how I'd define unwanted code.

In java i'd mark the method or class with #Deprecated.

Any PRIVATE code member with no active calling members (checked recursively). Otherwise you do not know if your code is not used out of your scope analysis.

Some things are already posted but here's another:
Functions that almost do the same thing. (only a small variable change and therefore the whole functions is copy pasted and that variable is changed)

Usually I tell my compiler to be as annoyingly noisy as possible, that picks 60% of stuff that I need to examine. Unused functions that are months old (after checking with the VCS) usually get ousted, unless their author tells me when they'll actually be used. Stuff missing prototypes is also instantly suspect.
I think trying to implement automated house cleaning is like trying to make a USB device that guarantees that you 'safely' play Russian Roulette.
The hardest part to check are components added to the build system, few people notice those and unused kludges are left to gather moss.
Beyond that, I typically WANT the code, I just want its author to refactor it a bit and make their style the same as the rest of the project.
Another helpful tool is doxygen, which does help you (visually) see relations in the source tree.. however, if its set at not extracting static symbols / objects, its not going to be very thorough.

Related

Progress-gl - What's benefit of placing variable declaration on top of the procedure

I've been doing Progress 4GL for 8 years though it's not my main responsibility. I do C++ and Java a lot more. When programming in other language it's suggested to have the declaration close to the usage. With 4GL however I see people place the declaration on top of the file. It's even in the coding standard.
I think placing them on top of them file would lead to 'vertical separation' problem. In most other language it's even suggested to do the assignment at the same line as the declaration.
The question is why it's suggested to do so in 4GL ? What's the benefit ? I know that it's possible to place the declaration anywhere in the file, given that it's declared before it is used.
I think the answer is to do with scoping, or the lack of it, within Progress 4GL.
If you are used to Java, say, and read a Progress 4GL program, that looks like
DO:
DEFINE VARIABLE x AS INTEGER INITIAL 4.
DISPLAY x.
END.
then you wouldn't expect to be able to use this value of x anywhere else in the program, and that any changes made in the block, wouldn't effect anything outside the block.
As I understand it, all progress variables declared within the body of a program are scoped to the whole program, unless they are declared are within an internal procedure or function, in which case they are scoped to the procedure or function.
(Incidentally any default buffers [i.e. undeclared] you use within an internal procedure/function are scoped to the whole program, not just the procedure or function, so you need to be very careful to explicity declare buffers in functions you intend ot use recursively).
I therefore think the convention of declaring variables at the beginning of a program is in order to reflect the fact that Progress will treat them has having been done so, regardless of where you put the declaration.
There is absolutely no benefit in scoping anything to the program as a whole when it could be scoped smaller.
Smaller scopes are easier to test, give less possibility of namespace conflict, and less opportunity for error.
Tightly scoped named buffers are especially useful when writing to the database because they eliminate the possibility of there ever being some other part of your code that uses the same buffer and causes a share-lock, i.e., this fails to compile:
do for b-customer transaction:
find b-customer where .... exclusive...
...
end.
...
find b-customer...
On the other hand, procedures and functions (and include files...) that share scope with the main body of code are a major source of bugs, because when you pick up your variable or whatever, you can never be entirely certain where it has been...
All of this is just basic Structured Programming, of course. It's true for every language and has been accepted since the 70's.
The "reason" that you usually see variables defined at the top is simple. Habit. That is just how things were done in the bad old days.
A lot of old code, or code written by old fossils, is written that way. No matter the language.
Some languages (COBOL springs to mind) even formalized it.
Is there any advantage to such an approach?
Not especially. I guess you could argue "they are all in one place and easy to find" but that isn't very compelling.
"Habit" is actually more compelling ;) If you are working with a team that expects a certain style or in an application where a particular style is prevalent then you should think twice before unilaterally throwing out a new way of doing things - the confusion could be a bigger problem than the advantages gained.

Object class members as pointers to avoid #include in headers - is it good practice?

This is really a question of precedence: which is more preferred in C++, avoiding pointers or avoiding #includes in header files?
"Don't Use #include in header files."
There seems to be some ambiguity based on my research. In this SO question, the top answer says "...make sure you actually need an include, [don't use one] when a forward declaration or even leaving it out completely will do." (From Header files and include best practice)
And this article explains the negative effect excess header inclusions can have on compile-time: http://blog.knatten.org/2012/11/09/another-reason-to-avoid-includes-in-headers/
As well as this tutorial, stating, "...you should try to put all of your code in the CPP class and only the class declaration in the HPP file.": https://github.com/LaurentGomila/SFML/wiki/Tutorial%3A-Basic-Game-Engine#wiki-declarations
"Don't Use Pointers."
But, there is also evidence that pointers should be avoided most often as well:
c++: when to use pointers?
https://softwareengineering.stackexchange.com/questions/56935/why-are-pointers-not-recommended-when-coding-with-c
Which preference takes precedence?
If my understanding about avoiding #includes in header files is correct, this can easily be done by changing things like class members to pointers so I can use a forward declaration instead, but is this a good idea for class members whose lifetime only lasts as long as the class itself?
It's not really an "one or the other". Both statements are true, but you need to understand the reasoning behind them.
tl;dr: Use forward declaration where possible to reduce compile time. Use stack objects or references as much as possible and pointers only in rare cases.
"Don't Use #include in header files."
This is a rather general statement, which as is, would be wrong. The more important part behind this statement actually is: "Use forward declarations where ever possible". Includes in header files are not something bad per se, but they often aren't needed either.
Forward declarations can be used, if the included type/class/etc. is used as a pointer in the new type/class/etc. declaration within the given header. Forward declaration just tells the compiler: "Somewhere a long the way you'll find the actual declaration of type X." The include can even be removed if the type isn't used at all in the declaration. The reason is that the compiler doesn't need to know anything about these types to calculate the required memory layout for the new type. For example a pointer has "always" the same size. Including the file additionally in the header, would potentially only waste processing power, since the compiler would have to open and parse the file, thus adding expensive seconds to the compile time. So in most cases you'll do yourself a favor by reducing the unnecessary includes in the header files and instead use forward declaration.
For the sake of completion: Forward declaration are explicitly needed if you get circular references (class A depends on class B, which depends on class C, which depends on class A). However this can often also reveal either bad design and/or old/outdate coding standards which would lead us to the second topic.
"Don't use pointers."
Again the statement is a tiny bit too general. One might rather want to say: "Don't use raw pointers."
With C++11 and soon C++1y the language itself has changed a lot. As much bad C++ books the world has seen, the more outdated C++ books float around nowadays (here's a good list however). While in the past we were mostly stuck with pointers new and delete for memory management, we've evolved to better, more readable, less risk and 100% memory leak free ways to manage the data in memory. One of the magic words is RAII - since you linked something from SFML above, here's a nice demonstration of the power of RAII. I see many people use pointers and new and delete just because or maybe because they are thinking in Java or C# terms were objects get instantiated with the new keyword. In C++ however object don't need to use new to be allocated and it's mostly preferable to run things on the stack instead of the heap. This works for many, many things, especially when using STL containers, which will hide the dynamic management in the background. The usage of the heap is mostly all cases only preferable if you need the data to be dynamic, non "local" or you need a lot of it. However when you use the heap, make sure to use smart pointers such as std::unique_ptr or std::shared_ptr depending on the use case, but certainly not raw pointers. In modern C++ raw pointers should never own an object anymore. There are cases where it's okay to return a raw pointer to reference an object, but there's really no reason in modern C++ to call new on a raw pointer.
Lets get back to the original question though. The "Don't use raw pointers" is essentially more of a design question and quite unrelated to the whole header issue. While there might be some cases where you'll have to switch to raw pointers, due to circular references, the use of forward declarations is otherwise just about compilation time (and maybe clean code), but it's not as essential for the programming itself.
In short: Don't use raw pointers to avoid inclusions in header files, but use forward declaration where ever possible and utilize smart pointers as much as possible.

Design - When to create new functions?

This is a general design question not relating to any language. I'm a bit torn between going for minimum code or optimum organization.
I'll use my current project as an example. I have a bunch of tabs on a form that perform different functions. Lets say Tab 1 reads in a file with a specific layout, tab 2 exports a file to a specific location, etc. The problem I'm running into now is that I need these tabs to do something slightly different based on the contents of a variable. If it contains a 1 I may need to use Layout A and perform some extra concatenation, if it contains a 2 I may need to use Layout B and do no concatenation but add two integer fields, etc. There could be 10+ codes that I will be looking at.
Is it more preferable to create an individual path for each code early on, or attempt to create a single path that branches out only when absolutely required.
Creating an individual path for each code would allow my code to be extremely easy to follow at a glance, which in turn will help me out later on down the road when debugging or making changes. The downside to this is that I will increase the amount of code written by calling some of the same functions in multiple places (for example, steps 3, 5, and 9 for every single code may be exactly the same.
Creating a single path that would branch out only when required will be a bit messier and more difficult to follow at a glance, but I would create less code by placing conditionals only at steps that are unique.
I realize that this may be a case-by-case decision, but in general, if you were handed a previously built program to work on, which would you prefer?
Edit: I've drawn some simple images to help express it. Codes 1/2/3 are the variables and the lines under them represent the paths they would take. All of these steps need to be performed in a linear chronological fashion, so there would be a function to essentially just call other functions in the proper order.
Different Paths
Single Path
Creating a single path that would
branch out only when required will be
a bit messier and more difficult to
follow at a glance, but I would create
less code by placing conditionals only
at steps that are unique.
Im not buying this statement. There is a level of finesse when deciding when to write new functions. Functions should be as simple and reusable as possible (but no simpler). The correct answer is almost never 'one big file that does a lot of branching'.
Less LOC (lines of code) should not be the goal. Readability and maintainability should be the goal. When you create functions, the names should be self documenting. If you have a large block of code, it is good to do something like
function doSomethingComplicated() {
stepOne();
stepTwo();
// and so on
}
where the function names are self documenting. Not only will the code be more readable, you will make it easier to unit test each segment of the code in isolation.
For the case where you will have a lot of methods that call the same exact methods, you can use good OO design and design patterns to minimize the number of functions that do the same thing. This is in reference to your statement "The downside to this is that I will increase the amount of code written by calling some of the same functions in multiple places (for example, steps 3, 5, and 9 for every single code may be exactly the same."
The biggest danger in starting with one big block of code is that it will never actually get refactored into smaller units. Just start down the right path to begin with....
EDIT --
for your picture, I would create a base-class with all of the common methods that are used. The base class would be abstract, with an abstract method. Subclasses would implement the abstract method and use the common functions they need. Of course, replace 'abstract' with whatever your language of choice provides.
You should always err on the side of generalization, with the only exception being early prototyping (where throughput of generating working stuff is majorly impacted by designing correct abstractions/generalizations). having said that, you should NEVER leave that mess of non-generalized cloned branches past the early prototype stage, as it leads to messy hard to maintain code (if you are doing almost the same thing 3 different times, and need to change that thing, you're almost sure to forget to change 1 out of 3).
Again it's hard to specifically answer such an open ended question, but I believe you don't have to sacrifice one for the other.
OOP techniques solves this issue by allowing you to encapsulate the reusable portions of your code and generate child classes to handle object specific behaviors.
Personally I think you might (if possible by your API) create inherited forms, create them on fly on master form (with tabs), pass agruments and embed in tab container.
When to inherit form and when to decide to use arguments (code) to show/hide/add/remove functionality is up to you, yet master form should contain only decisions and argument passing and embeddable forms just plain functionality - this way you can separate organisation from implementation.

Is it naughty to have a large utility file?

In my C project I have quite a large utils.c file. It is really full of many utilities of different sorts. I feel a bit naughty just stuffing different miscellaneous functions in there. For example it has some utilities related to low level stuff such as a lowercase() function, and it also has some quite sophisticated utilities such as converting to/from different colour formats.
My question is, is it very naughty to have such a large utils.c with many different types of utilities in it? Should I break it up into many different kinds of utility files? Such as graphics_utils.c and so on What do you think?
Breaking them up into separate files based on categories (ie graphics, strings, etc.) will lead to better organization, making it easier to locate certain pieces of code, having smaller files to go through, instead of just one large file.
You want to break it up, not just for organizational reasons, but because you will have many other files that depend on this one. Because everything will depend on this file, it makes this one file difficult to change because it might cause widespread breakage.
http://ifacethoughts.net/2006/04/15/stable-dependencies-principle/
If it's just you that will EVER maintain the stuff, it's a matter of when the complexity gets to the point where you find yourself searching for things. That would be the time to refactor and reorganize (there's a cost to reorganize, just as there's a cost to not reorganize).
If it's POSSIBLE that anyone else will maintain a project that includes your utils, you have to consider THEIR pain point when deciding when to reorganize. Theirs is MUCH lower than yours.
I tend to break them up into various sub-utils as you say (graphics_utils) when it becomes appropriate.
Break it up. Stuff will be easier to find, easier to reuse, easier to refactor, easier to unit test. I recently needed to get a set of ISO-8601 date handling methods out of a ginormous Java utility class of static methods, and it was really hard to find the 5% of the code I needed.
It is definitely not kosher, because the next guy coming through your code won't know where to look for anything. Break it up by function, and your coworkers will thank you!
Another advantage that comes from breaking up the file into separates is that when you place it under source control, you can have finer grained control. This really is useful if you have bits that are tweaked/extended/specialised frequently, and other bits that are relatively stable.
Another point: You should organize your code, i. e. break it up in smaller modules and categorize it, because at some point in time you will end up writing a second and third function for the same thing, simply for the reason that you wont find that function that you knew it was there, but you don't remember it's name.
I've got a (rather large) project with such a module and there is programming logic for which there are up to 5-6 implementations (for the same thing).
Like everyone else I would break them up. But I tend to use Extension Methods now, so I would have one class (and one file) per class being extended (e.g. StringExtensions, SqlDataReaderExtensions, etc). I find this tends to break up the utility methods nicely.

Does it matter if there are unused functions I put into a big CoolFunctions.h / CoolFunctions.m file that's included everywhere in my project?

I want to create a big file for all cool functions I find somehow reusable and useful, and put them all into that single file. Well, for the beginning I don't have many, so it's not worth thinking much about making several files, I guess. I would use pragma marks to separate them visually.
But the question: Would those unused methods bother in any way? Would my application explode or have less performance? Or is the compiler / linker clever enough to know that function A and B are not needed, and thus does not copy their "code" into my resulting app?
This sounds like an absolute architectural and maintenance nightmare. As a matter of practice, you should never make a huge blob file with a random set of methods you find useful. Add the methods to the appropriate classes or categories. See here for information on the blob anti-pattern, which is what you are doing here.
To directly answer your question: no, methods that are never called will not affect the performance of your app.
No, they won't directly affect your app. Keep in mind though, all that unused code is going to make your functions file harder to read and maintain. Plus, writing functions you're not actually using at the moment makes it easy to introduce bugs that aren't going to become apparent until much later on when you start using those functions, which can be very confusing because you've forgotten how they're written and will probably assume they're correct because you haven't touched them in so long.
Also, in an object oriented language like Objective-C global functions should really only be used for exceptional, very reusable cases. In most instances, you should be writing methods in classes instead. I might have one or two global functions in my apps, usually related to debugging, but typically nothing else.
So no, it's not going to hurt anything, but I'd still avoid it and focus on writing the code you need now, at this very moment.
The code would still be compiled and linked into the project, it just wouldn't be used by your code, meaning your resultant executable will be larger.
I'd probably split the functions into seperate files, depending on the common areas they are to address, so I'd have a library of image functions separate from a library of string manipulation functions, then include whichever are pertinent to the project in hand.
I don't think having unused functions in the .h file will hurt you in any way. If you compile all the corresponding .m files containing the unused functions in your build target, then you will end up making a bigger executable than is required. Same goes for if you include the code via static libraries.
If you do use a function but you didn't include the right .m file or library, then you'll get a link error.