Are there any merge tools for source control that understand code? - version-control

I've recently been working through a large codebase, refactoring and generally improving design to increase coverage. Also, in quite a few files I've removed excess using statements, moved methods so that similar functionality is close together, added regions etc. but not actually changed the functionality of the code in the file.
Meanwhile, elsewhere on the team other developers are fixing bugs and changing lines of code here and there. Obviously when it comes to merging this can be an issue since line numbers no longer match and methods may have moved.
Now, I understand the general rule that in a source controlled environment it can be a dangerous thing to move methods around, and we decided that the benefit outweighed the cost. What I don't understand however is why it should be this way.
Say that my initial file was a simple calculator:
public class Calculator
public int Subtract(int a, int b)
return a + b;
public int Add(int a, int b)
return a + b;
And I decided that I wanted the methods to be alphabetical:
public class Calculator
public int Add(int a, int b)
return a + b;
public int Subtract(int a, int b)
return a + b;
While another developer fixed the bug in the subtract method
public class Calculator
public int Subtract(int a, int b)
return a - b;
public int Add(int a, int b)
return a + b;
A standard merge tool would probably require you to manually merge these two files, but one that understood the functionality of the code would easily be able to reconcile these two changes. The same applies to removing or adding other methods, comments, regions or using statements.
So, to (finally!) get to the question: Are there any merge tools out there that have an intelligent understanding of the functionality of code and could merge the two files above without any human intervention? If not, why not? Are there any complications which make this an unsolvable problem (of course a understand it isn't as simple as I'm implying - but is it impossible for some reason that I can't see?)
I uses C# in my source code and would love something that worked with that, but I'm interested in if this exists anywhere in the world of programming...
I'm already really concerned about the length of this question, but edited to add how I would expect the intelligent source system to work:
When the initial calculator file was checked in the system would parse the file and create a hierarchy of the class:
File: Calculator.cs
|--Class[0]: Calculator
|--Method[0]: Subtract
|--Line[0]: return a + b;
|--Method[1]: Add
|--Line[0]: return a +b;
(With extra lines in there for braces etc...)
When I check in my code (making the methods alphabetical) it updates the hierarchy above so that Subtract becomes Method[1] and Add becomes Method[0].
The second developer checks in his code (which obviously the source control system knows was based of the original) and notices the change to the first line in subtract. Now, rather than finding that line by line number in the overall file it knows that it can find it a Calculator.cs/Calculator/Subtract/0 and the fact that the method has changed location doesn't matter, it can still make the merge work.

Our approach with Plastic SCM is still far from being "complete", but it's already released and can help in this kind of situations. Take a look at Xmerge. Of course, feedback will be more than welcome and will grant some free licenses ;-)

I think that Source Code in Database is one potential answer to your question. The general idea is that you don't version files, you version blocks of code. The versioning system knows about the code DOM, and lets you query on the code DOM in order to check out functions, classes, what-have-you, for editing, compiling, etc.
Since the order of the methods doesn't necessarily matter, they're not stored in the Database with any order in mind. When you check out the class, you can specify the order that you like best (alphabetical, public/protected/private, etc). The only changes that matter are the ones like where you switch the + to a -. You won't have a conflict due to reordering the methods.
Unfortunately, SCID is still VERY young and there aren't many tools out there for it. However, it is quite an interesting evolution in the way one views and edits code.
Edit: Here's another reference for SCID


Swift get vs _read

What's the difference between the following 2 subscripts?
subscript(position: Int) {
get { ... }
subscript(position: Int) {
_read { ... }
_read is part of the Swift Ownership story that has been in development for a while now. Since read (the likely name once it goes through Swift Evolution) is fairly advanced concept of the language you will probably want to read at least where it is described in the Ownership Manifesto here to get a more full answer than I'll provide here.
It is an alternative to get on subscripts that allows you to yield a value instead of return a value. This is essential for move only types because they cannot be copied (that is their entire purpose) which is what happens when you return a value. By using read it makes it so you could have for example have an Array of move only types and still use the values in it without taking the ownership of them by moving them. The easiest (and not technically correct since it is a coroutine) way to conceptually think about it is that you get a pointer to the object that read yields.
The sibling of read is modify which is currently in the pitch phase of Swift Evolution here so that can also give you some helpful insight into what read is since it is a coroutine as well.
So for now if Xcode gives you a _read to implement simply change it to get since it is a bug since it isn't an official part of the language yet.

Proper way to modify public interface

Let's assume we have a function that returns a list of apples in our warehouse:
List<Apple> getApples();
After some lifetime of the application we've found a bug - in rare cases clients of this function get intoxication because some of the apples returned are not ripe yet.
However another set of clients absolutely does not care about ripeness, they use this function simply to know about all available apples.
Naive way of solving this problem would be to add the 'ripeness' member to an apple and then find all places where ripeness can cause problems and put some checks.
const auto apples = getApples();
for (const auto& apple : apples)
if (apple.isRipe())
However, if we correlate this new requirement of having ripe apples with the way class interfaces are usually designed, we might find out that we need new interface which is a subset of a more generic one:
List<Apple> getRipeApples();
which basically extends the getApples() interface by filtering the ones that are not ripe.
So the questions are:
Is this correct way of thinking?
Should the old interface (getApples) remain unchanged?
How will it handle scaling if later on we figure out that some customers are allergic to red/green/yellow apples (getRipeNonRedApples)?
Are there any other alternative ways of modifying the API?
One constraint, though: how do we minimize the probability of inexperienced/inattentive developer calling getApples instead of getRipeApples? Subclass the Apple with the RipeApple? Make a downcast in the getRipeApples?
A pattern found often with Java people is the idea of versioned capabilities.
You have something like:
interface Capability ...
interface AppleDealer {
List<Apples> getApples();
and in order to retrieve an AppleDealer, there is some central service like
public <T> T getCapability (Class<T> type);
So your client code would be doing:
AppleDealer dealer = service.getCapability(AppleDealer.class);
When the need for another method comes up, you go:
interface AppleDealerV2 extends AppleDealer { ...
And clients that want V2, just do a `getCapability(AppleDealerV2.class) call. Those that don't care don't have to modify their code!
Please note: of course, this only works for extending interfaces. You can't use this approach neither to change signatures nor to remove methods in existing interfaces.
Regarding your question 3/4: I go with MaxZoom there, but to be precise: I would very much recommend for "flags" to be something like List<String>, or List<Integer> (for 'real' int like flags) or even Map<String, Object>. In other words: if you really don't know what kind of conditions might come over time, go for interfaces that work for everything: like one where you can give a map with "keys" and "expected values" for the different keys. If you go for pure enums there, you quickly run into similar "versioning" issues.
Alternatively: consider to allow your client to do the filtering himself, using something like; using Java8 you can think of Predicates, lambdas and all that stuff.
Predicate<Apple> applePredicate = new Predicate<Apple>() {
public boolean test(Apple a) {
return a.getColour() == AppleColor.GoldenPoisonFrogGolden;
List<Apples> myApples = dealer.getApples(applePredicate);
IMHO creating new class/method for any possible Apple combination will result in a code pollution. The situation described in your post could be gracefully handled by introducing flags parameter :
List<Apple> getApples(); // keep for backward compatibility
List<Apple> getApples(FLAGS); // use flag as a filter
Possible flags:
So a call like below could be possible:
List<Apple> getApples(RIPE_FLAG & RED_FLAG & SWEET_FLAG);
that will produce a list of apples that are ripe, and red-delicious.

Eclipse and Java 8 content assist

I wanted to check Java 8 integration with Eclipse Luna so I downloaded the M7 Luna from
After configuring the JDK to jdk8u5, I started some tests.
Let's say you have a nice Runnable like
Runnable r = new Runnable() {
public void run() {
If you select the
new Runnable() {
public void run() {
block and press Ctrl-1 (Quick Fix), you get the suggestion to change it to a lambda, resulting in Runnable r = () -> System.out.println("foo");, which is pretty cool.
But a nicer thing whould be to actually help creating lambda expression.
For instance, if you type Runnable r = | (with | being the cursor location) and press ctrl+Space (content-assist), I would have expected to find a "create a lambda expression from this functional interface" option in the displayed popup. But nothing new is available.
Do you know if this will be implemented in the future ?
I think it might have something to do with the templates (Java/Editor/Templates in preferences) but I actually never experimented with them.
Providing good proposal right after the = is rather tricky as almost everything could be placed on the right hand side of an assignment.
Even the old way of implementing a function using an anonymous inner class was not proposed right after the equal sign. You had to type the four characters new␣ before the suggestion came up. And four characters is exactly what you have to type to create a lambda, ()->, but at this place proposing the creation of a lambda makes no sense anymore as you have already created it.
So proposing a lambda would require lifting its priority compared to other proposals to appear right after the equal sign but it would still have rather limited benefit. You had to press crtl+space unless you use automatic menu popup, then select “create lambda” to just get either the four characters ()-> or something like name-> inserted whereas the parameter name(s) are likely to be changed after the proposal is inserted.
For an inner class, read method overriding, it makes sense to propose parameters as you have to repeat all parameter types exactly, but for a lambda where you can omit all the bulk the saving is very limited.
So I don’t expect a proposal of lambda creation to ever appear in the list.

Auto=RTrim Strings in Entity Framework, ServiceStack OrmLite, PetaPoco, etc

Edited for Clarity
I've been looking at ORMs for the last week, as well as trying to decide if I want to bother with them. At the end of the day, there seem to be about a dozen worthy contenders, of which most are fairly hard to tell apart. I eventually settled on the potential trio of EF, OrmLite and PetaPoco, all of which seem pretty good.
One feature I've been looking for is the ability to magically configure the code generator to automatically right trim all strings in the generated POCOs, without any changes to the DB. I have a database with literally thousands of records spread across hundreds of fields, and every single string field has a bunch of spaces at the end of it for legacy reasons. Those need to be stripped from the resulting POCOS/Entities to make the processing less ugly, but I can't make any changes to the DB (it's not mine), so I'm wondering if there is easy-easy way to do it.
With Entity Framework I looked a little bit at the process for Database First and Model First design, and those look like you could probably tweak the T4 template code to generate appropriate code on a case by case basis. This seems like it would be viable, but I don't want to reinvent the wheel if someone has already done it. I would just like to have the code that takes care of the problem.
For the other ORMs, I could probably pull them in the house, figure out how they work and plug-in some kind of logic that does the magic.
So does anybody have a suggestion for an ORM that has a configuration switch that can automatically right-trim all strings? It would make the database much easier to work with, hundred percent certain there is never any value in those extra spaces at the end.
Thought this was a good feature so I've just added this to ServiceStack.OrmLite where you can now add a custom filter for strings, e.g:
OrmLiteConfig.StringFilter = s => s.TrimEnd();
public class Poco
public string Name { get; set; }
using (var db = OpenDbConnection())
db.Insert(new Poco { Name = "Value with trailing " });
var row = db.Select<Poco>().First();
Assert.That(row.Name, Is.EqualTo("Value with trailing"));
It will be in the next v4.0.19+ NuGet release of ServiceStack, which is now available on ServiceStack's MyGet Feed.
With Entity Framework (and possibly PetaPoco which I don't know personally) you should be able to modify the T4 template and add read-only properties to your entities, returning the trimmed value of database-related property.
public string Name
get { return this.DbName.TrimEnd(); }
You have to find a way to do this for string properties only (I think one of the methods that are visible in the T4 template can be used for that, but I'm not sure).
Modifying T4 templates is something you may have to do again when updates are released.
You can't use the read-only properties directly in LINQ-to-entities because EF can't translate them into SQL. You'll alway have to use them after an AsEnumerable() call.

Using table-of-contents in code?

Do you use table-of-contents for listing all the functions (and maybe variables) of a class in the beginning of big source code file? I know that alternative to that kind of listing would be to split up big files into smaller classes/files, so that their class declaration would be self-explanatory enough.. but some complex tasks require a lot of code. I'm not sure is it really worth it spending your time subdividing implementation into multiple of files? Or is it ok to create an index-listing additionally to the class/interface declaration?
To better illustrate how I use table-of-contents this is an example from my hobby project. It's actually not listing functions, but code blocks inside a function.. but you can probably get the idea anyway..
void SelectionManager::FindSelection()
// Order_mouse_from_to_points
// Lines_intersecting_with_upper_point
// Lines_intersecting_with_both_points
// Lines_not_intersecting
// Lines_intersecting_bottom_points
// Update_intersection_range_indices
// Rough_method
// Normal_method
// First_selected_item
// Last_selected_item
// Other_selected_item
Notice that index-items don't have spaces. Because of this I can click on one them and press F4 to jump to the item-usage, and F2 to jump back (simple visual studio find-next/prevous-shortcuts).
Another alternative solution to this indexing is using collapsed c# regions. You can configure visual studio to show only region names and hide all the code. Of course keyboard support for that source code navigation is pretty cumbersome...
I know that alternative to that kind of listing would be to split up big files into smaller classes/files, so that their class declaration would be self-explanatory enough.
but some complex tasks require a lot of code
Incorrect. While a "lot" of code be required, long runs of code (over 25 lines) are a really bad idea.
actually not listing functions, but code blocks inside a function
Worse. A function that needs a table of contents must be decomposed into smaller functions.
I'm not sure is it really worth it spending your time subdividing implementation into multiple of files?
It is absolutely mandatory that you split things into smaller files. The folks that maintain, adapt and reuse your code need all the help they can get.
is it ok to create an index-listing additionally to the class/interface declaration?
If you have to resort to this kind of trick, it's too big.
Also, many languages have tools to generate API docs from the code. Java, Python, C, C++ have documentation tools. Even with Javadoc, epydoc or Doxygen you still have to design things so that they are broken into intellectually manageable pieces.
Make things simpler.
Use a tool to create an index.
If you create a big index you'll have to maintain it as you change your code. Most modern IDEs create list of class members anyway. it seems like a waste of time to create such index.
I would never ever do this sort of busy-work in my code. The most I would do manually is insert a few lines at the top of the file/class explaining what this module did and how it is intended to be used.
If a list of methods and their interfaces would be useful, I generate them automatically, through a tool such as Doxygen.
I've done things like this. Not whole tables of contents, but a similar principle -- just ad-hoc links between comments and the exact piece of code in question. Also to link pieces of code that make the same simplifying assumptions that I suspect may need fixing up later.
You can use Visual Studio's task list to get a listing of certain types of comment. The format of the comments can be configured in Tools|Options, Environment\Task List. This isn't something I ended up using myself but it looks like it might help with navigating the code if you use this system a lot.
If you can split your method like that, you should probably write more methods. After this is done, you can use an IDE to give you the static call stack from the initial method.
EDIT: You can use Eclipse's 'Show Call Hierarchy' feature while programming.