So many times when you do a diff of two versions of a code file, the tool completely screws up understanding what's changed... you maybe move a block of code inside an extra level of braces or add an extra ...} else if {... and suddenly it gets all out of sync.
So I wondered if any context-aware tools exist which actually try to understand the content and make smarter decisions, rather than doing a generic diff?
More usefully, can one plug such tools into VCS like git/SVN?
SemanticMerge for C# and Java sources
DiffDog is XML-aware differ|merger
Related
I have a list of binaries that link some static libraries. It was identified that a bunch of these libraries are circular dependent. We never ran into troubles because we enclosed these static libraries between -Wl,--start-group, and -Wl,--end-group
Having understood that this is a bad practice, I'm trying to clean the system.
I've come up with a perl script that tells me how these libraries are dependent on each other, like so:
libchld.a depends on libprnt.a, libgprnt.a
libprnt.a depends on libncle.a, libgprnt.a
and goes one.
Now, I should sort these topologically with each node either pointing upwards or downwards.
And then If I find a set of cyclic dependent libraries while sorting topologically, I will have to enclose only those inside a --start-group and --end-group(than enclosing the entire bunch of libraries) there by cleaning up the system.
Are there some perl modules that does this type of sorting already?
Sort::Topological
Graph::Directed
are those I'm trying to check. But, they don't appear to handle if the graph is circular.
Having understood that this is a bad practice, I'm trying to clean the system.
It's a bad practice because you are not using proper layering, not because it's somehow bad for the linker.
Therefore, cleaning up the link line without re-arranging the libraries into a proper hierarchy with no circular dependencies is a pointless exercise.
And if you do rearrange the libraries, then their proper order will be easy to understand and you wouldn't need to use Perl for that.
When a simple refactoring like “rename field” has been done on one branch it can be very hard to merge the changes into the other branches. (Extract method is much harder as the merge tools don’t seem to match the unchanged blocks well)
Now in my dreams, I am thinking of a tool that can record (or work out) what well defined refactoring operations have been done on one branch and then “replay” them on the other branch, rather than trying to merge every line the refactoring has affected.
see also "Is there an intelligent 3rd merge tool that understands VB.NET" for the other half of my pain!
Also has anyone try something like MolhadoRef (blog article about MolhadoRef and Refactoring-aware SCM), This is, in theory, refactoring-aware source control.
You could use coccinelle to do the same kind of refactoring operations on different branches. It will not record or figure out what is being done by itself, you have to explicitly tell it what to do, but other than that it will more or less effortlessly do the same refactoring on as many branches you point it to.
This tool have been used in the linux kernel for updating API usage etc.
To quote from its web page:
"Coccinelle is a program matching and
transformation engine which provides
the language SmPL (Semantic Patch
Language) for specifying desired
matches and transformations in C code."
Darcs supports a 'token replace' operation in a commit, which replaces all instances of one token with another, and merges as you'd want it to.
Araxis Merge doesn't understand common refactoring but it is the only three way merge tool that I've used. It is available for both the Mac and Windows, and it supports an Automation API so I would imagine that you could do what you want with that if you were so inclined. For the record I have no connection with Araxis other than I've used their product.
Plastic SCM (www.plasticscm.com) 3-way merge tool implements Xmerge which is the only one able to assist you merging code that has been moved.
There are now some better merge tools (for example SemanticMerge) that are based on language parsing, designed to deal with code that has been moved and modified. JetBrains (the create of ReShaper) has just posted a blog on this.
There has been lots of research on this over the years, at last some products are coming to market.
In Linux you can use Meld or in Windows Winmerge.
In any case, both tools only "understand" about lines of text. Refactoring requires a way of understanding the code, which is beyond any merging/comparing tool that I known.
In my C project I have quite a large utils.c file. It is really full of many utilities of different sorts. I feel a bit naughty just stuffing different miscellaneous functions in there. For example it has some utilities related to low level stuff such as a lowercase() function, and it also has some quite sophisticated utilities such as converting to/from different colour formats.
My question is, is it very naughty to have such a large utils.c with many different types of utilities in it? Should I break it up into many different kinds of utility files? Such as graphics_utils.c and so on What do you think?
Breaking them up into separate files based on categories (ie graphics, strings, etc.) will lead to better organization, making it easier to locate certain pieces of code, having smaller files to go through, instead of just one large file.
You want to break it up, not just for organizational reasons, but because you will have many other files that depend on this one. Because everything will depend on this file, it makes this one file difficult to change because it might cause widespread breakage.
http://ifacethoughts.net/2006/04/15/stable-dependencies-principle/
If it's just you that will EVER maintain the stuff, it's a matter of when the complexity gets to the point where you find yourself searching for things. That would be the time to refactor and reorganize (there's a cost to reorganize, just as there's a cost to not reorganize).
If it's POSSIBLE that anyone else will maintain a project that includes your utils, you have to consider THEIR pain point when deciding when to reorganize. Theirs is MUCH lower than yours.
I tend to break them up into various sub-utils as you say (graphics_utils) when it becomes appropriate.
Break it up. Stuff will be easier to find, easier to reuse, easier to refactor, easier to unit test. I recently needed to get a set of ISO-8601 date handling methods out of a ginormous Java utility class of static methods, and it was really hard to find the 5% of the code I needed.
It is definitely not kosher, because the next guy coming through your code won't know where to look for anything. Break it up by function, and your coworkers will thank you!
Another advantage that comes from breaking up the file into separates is that when you place it under source control, you can have finer grained control. This really is useful if you have bits that are tweaked/extended/specialised frequently, and other bits that are relatively stable.
Another point: You should organize your code, i. e. break it up in smaller modules and categorize it, because at some point in time you will end up writing a second and third function for the same thing, simply for the reason that you wont find that function that you knew it was there, but you don't remember it's name.
I've got a (rather large) project with such a module and there is programming logic for which there are up to 5-6 implementations (for the same thing).
Like everyone else I would break them up. But I tend to use Extension Methods now, so I would have one class (and one file) per class being extended (e.g. StringExtensions, SqlDataReaderExtensions, etc). I find this tends to break up the utility methods nicely.
How would you define "unwanted code"?
Edit:
IMHO, Any code member with 0 active calling members (checked recursively) is unwanted code. (functions, methods, properties, variables are members)
Here's my definition of unwanted code:
A code that does not execute is a dead weight. (Unless it's a [malicious] payload for your actual code, but that's another story :-))
A code that repeats multiple times is increasing the cost of the product.
A code that cannot be regression tested is increasing the cost of the product as well.
You can either remove such code or refactor it, but you don't want to keep it as it is around.
0 active calls and no possibility of use in near future. And I prefer to never comment out anything in case I need for it later since I use SVN (source control).
Like you said in the other thread, code that is not used anywhere at all is pretty much unwanted. As for how to find it I'd suggest FindBugs or CheckStyle if you were using Java, for example, since these tools check to see if a function is used anywhere and marks it as non-used if it isn't. Very nice for getting rid of unnecessary weight.
Well after shortly thinking about it I came up with these three points:
it can be code that should be refactored
it can be code that is not called any more (leftovers from earlier versions)
it can be code that does not apply to your style-guide and way-of-coding
I bet there is a lot more but, that's how I'd define unwanted code.
In java i'd mark the method or class with #Deprecated.
Any PRIVATE code member with no active calling members (checked recursively). Otherwise you do not know if your code is not used out of your scope analysis.
Some things are already posted but here's another:
Functions that almost do the same thing. (only a small variable change and therefore the whole functions is copy pasted and that variable is changed)
Usually I tell my compiler to be as annoyingly noisy as possible, that picks 60% of stuff that I need to examine. Unused functions that are months old (after checking with the VCS) usually get ousted, unless their author tells me when they'll actually be used. Stuff missing prototypes is also instantly suspect.
I think trying to implement automated house cleaning is like trying to make a USB device that guarantees that you 'safely' play Russian Roulette.
The hardest part to check are components added to the build system, few people notice those and unused kludges are left to gather moss.
Beyond that, I typically WANT the code, I just want its author to refactor it a bit and make their style the same as the rest of the project.
Another helpful tool is doxygen, which does help you (visually) see relations in the source tree.. however, if its set at not extracting static symbols / objects, its not going to be very thorough.
I have taken over a large code base and would like to get an overview how and where certain classes and their methods are used.
Is there any good tool that can somehow visualize the dependencies and draw a nice call tree or something similar?
The code is in C++ in Visual Studio if that helps narrow down any selection.
Here are a few options:
CodeDrawer
CC-RIDER
Doxygen
The last one, doxygen, is more of an automatic documentation tool, but it is capable of generating dependency graphs and inheritance diagrams. It's also licensed under the GPL, unlike the first two which are not free.
When I have used Doxygen it has produced a full list of callers and callees. I think you have to turn it on.
David, thanks for the suggestions. I spent the weekend trialing the programs.
Doxygen seems to be the most comprehensive of the 3, but it still leaves some things to be desired in regard to callers of methods.
All 3 seem to have problems with C++ templates to varying degrees. CC-Rider simply crashed in the middle of the analysis and CodeDrawer does not show many of the relationships. Doxygen worked pretty well, but it too did not find and show all relations and instead overwhelmed me with lots of macro references until I filtered them out.
So, maybe I should clarify "large codebase" a bit for eventual other suggestions: >100k lines of code overall spread out over more than 100 template files plus several actual class files pulling it all together.
Any other tools out there, that might be up to the task and could do better (more thoroughly)? Oh and specifically: anything that understands IDL and COM interfaces?
When I have used Doxygen it has produced a full list of callers and callees. I think you have to turn it on.
I did that of course, but like I mentioned, doxygen does not consider interfaces between objects as they are defined in the IDL. It "only" shows direct C++ calls.
Don't get me wrong, it is already amazing what it does, but it is still not complete from my high level view trying to get a good understanding of how everything fits together.
In Java I would start with JDepend. In .NET, with NDepend. Don't know about C++.