Whitespace in version control (darcs) - version-control

A junior programmer in our office has an unfortunate (but understandable) habit of using Eclipse's "Correct all the indentation in this file" feature. As a result, his checked out copy includes thousands of lines that register as changes, simply because the whitespace is different. Accepting all these changes - while other people are also working on the same code, some of them in different offices - will lead to conflicts. At the same time we don't want to throw away all the work he's done.
Are there any options for Darcs to ignore or normalise whitespace changes; or tools that can revert the differences?

i have never used darcs, but this is how i'd deal with it:
(cd newbies-copy && darcs diff --diff-opts -w) | (cd fresh-copy && darcs apply)
hopefully i got the darcs commands right, just skimmed the manual.

Related

Is there an way to check when an Emacs Lisp function was added to Emacs?

I am the author of an Emacs package, and occasionally when working on my package, I will come across a useful-looking function and use it in my code. Then, shortly after I release, someone using an older Emacs version (but still a version that I want to support) will report that the function is not defined, and I realize that this function has only recently been added in the latest version of Emacs, and am forced to revert the change (example). So, is there any way to check the version of Emacs in which a specific function was first added? Or better yet, a way to read my whole elisp file and report the minimum version of Emacs it would require and why?
I suppose the brute-force solution would be to install every version of Emacs I want to support and test-compile my package with each of them. So I'm asking if there's a better way that.
The file etc/NEWS and its siblings for older versions NEWS.[12]* contains very detailed change history for every Emacs version since the neolithic age. (The oldest ones are -- perhaps predictably -- somewhat less detailed.)
Unfortunately, this information is not in machine readable form; but you could still grep around for the information with some less-than-perfect precision.
Here is a quick and dirty Awk script I cobbled together.
#!/bin/sh
# Path to your emacs etc directory
emacs_etc=$HOME/git/emacs-snapshot/etc
# Reverse the order of the wildcard matches so we search newest to oldest
printf '%s\n' $emacs_etc/NEWS.[12]* $emacs_etc/NEWS |
tac |
xargs awk -v fn="$1" 'BEGIN { regex="^[*]+ New .*" fn }
/^\*[^*]/ { version=$NF }
$0~regex { print version ":" $0; exit 0 }
END { exit 1 }'
This is slightly inexact in that it requires the entry to begin with New (this seems to be somewhat consistent in the newer entries, but less so in the older ones) and it will find a prefix of the search string, not necessarily an exact match.
However, the change you are looking for does not match this expected format. The commit which added set-default-toplevel-value just added a free-form notice to NEWS which does not mention that it introduced a new variable.
To actually find it, I located it in the source tree (src/eval.c) and a straightforward git blame on the line which contains the definition pointed straight to the pertinent commit. (In the general case, you might need to peel off commits in layers; fortunately, Git has fairly good support for this kind of thing.)
If you want to know the version of Emacs when a specific function was introduced, for relatively recent version of Emacs (version 18 and later), then you could search into the NEWS files located in the etc directory.
Clone the latest Emacs version git repo from https://git.savannah.gnu.org/cgit/emacs.git
Then you can grep search the function name in the emacs/etc directory with contains the NEWS files:
NEWS NEWS.18 NEWS.20 NEWS.22 NEWS.24 NEWS.26
NEWS.1-17 NEWS.19 NEWS.21 NEWS.23 NEWS.25 NEWS.27
It will depend on what your looking for, but if the name your looking for is distinguishable enough you should find something in there.
Using the excellent ripgrep tool I was looking for that info related to the function string-to-syntax and the macro defvar-local, and this is what I got:
> rg string-to-syntax
NEWS.21
3031:** The new function `string-to-syntax' can be used to translate syntax
3038: (string-to-syntax "()")
>
> rg defvar-local
NEWS.24
2189:** New macros `setq-local' and `defvar-local'.
>

Tools to diff, patch or merge on a word-by-word basis

For text, not source code, files like LaTeX, markdown, restructuredText, usually single line breaks does not matter for the semantics and they are frequently refilled within 80 columns. When things are changed, the line break might change by quite a lot. So the common line-by-line diff and patch tools do not actually work for them very well. So I am wondering if there already exist good tools for diffing, patching and even merging this kind of changes? wdiff and git diff --color-words does exactly the kind of thing, but they seem to lack the patching and merging capability. Ideally, if we have got a line
He do not owe us nothing.
and one author changed it into
He do not owe us anything.
and another author changed it into
He does not owe us nothing.
then a merge could give
He does not owe us anything.
without conflict. That is the ideal result. Thanks in advance.
Besides meld, you can also use Beyond Compare or WinMerge

Does a bisect in version control benefit from using a rebaseif workflow?

The rebaseif mercurial extension automates the process, when pulling, of doing a rebase only if the merge can be done automatically with no conflicts.  (If there are conflicts to resolve manually, it does not rebase, leaving you ready to do a manual merge of the two branches.)  This simplifies and linearizes the history when developers are working in different parts of the code, although any rebase does throw away some information about the state of the world when a developer was doing work. I tend to agree with arguments like this and this that in the general case, rebasing is not a good idea, but I find the rebase-if philosophy appealing for the non-conflict case. I’m on the fence about it, even though I understand that there are still risks of logic errors when changes happen in different parts of the code (and the author of rebaseif extension has come to feel it’s a bad idea..)
I recently went through a complicated and painful bisect, and I think that having a large number of merges of short branches in our repository was the main reason the bisect did not live up to its implied O(lg n) promise.  I found myself needing to run "bisect --extend" many times, to stretch the range beyond the merge, going by a couple of changesets at a time, essentially making bisect O(n).  I also found it very complicated to keep track of how the bisect was going and to understand what information I'd gained so far, because I couldn't follow the branching when looking at graphs of the repository.
Are there better ways to use bisect (and to look at and understand the revision history) or am I right that the process would have been smoother if we had used rebaseif more in development. Alternately, can you help me understand more concretely what may go wrong using rebase in the non-conflict case: is it likely enough to cause problems that it should be avoided?
I’m tagging this more generally (not just mercurial) since I think rebaseif matches a more typical git workflow: git users may have seen the gotchas.
I think the answer is simple: you have to devide between hard bisects or risky rebasing.
Or, something in between: only rebase if it is very unlikely that the rebase silently breaks things. If a rebase involves only a few changesets which additionally are semantically distant to the changes they are rebased on, it's usually safe to rebase.
Here's an example, where a conflict-free merge breaks things:
Suppose two branches start from a file with this content:
def foo(a):
# do
# something
# with a (an integer)
...
foo(4)
In branch A, this is changed to:
def foo(a):
# now this function is 10 times faster, but only work with positive integers
assert a > 0
# do
# something with
# with a
...
foo(4)
In branch B, it is changed to:
def foo(a):
# do
# something
# with a (an integer)
...
foo(4)
...
foo(-1) # now we have a use case where we need to call foo with -1
Semantically, both edits conflict with each other. However, Mercurial happily merges them without conflicts (in both cases, when rebasing or when doing a regular merge):
def foo(a):
# now this function is 10 times faster, but only work with positive integers
assert a > 0
# do
# something with
# with a
...
foo(4)
...
foo(-1) # now we have a use case where we need to call foo with -1
The advantage of a merge is that a it allows to understand what went wrong at some later point, so you can fix things accordingly. A rebase might throw away information you need to understand bugs caused by automatic merges.
The main argument against git rebase seems to be a philosophical one around "losing history", but if I really cared about that I'd make the final build step a checkin (or the first build step to track all the failed builds too!).
I'm not particularly familiar with Mercurial or bisecting (except that it's a bit like git), but in my month-and-a-bit with git I exclusively stuck to rebase. I also use git rebase -i --autosquash and git add -p a lot.
IME, there's also not that much difference between a rebase and a merge when it comes to fixing conflicts — the answer you linked to suggests "rebaseif" is bad because the "if" conditions on whether the merge proceeded without conflict, whereas it should be conditioned on whether the codebase builds and tests pass.
Perhaps my thinking is skewed by an inherent weakness in git's design (it doesn't explicitly keep track of the history of a branch, i.e. the subset of commits that it's actually pointed to), or perhaps it's just how I work (check that the diff is sane and that it builds, although admittedly after a rebase I don't check that intermediate commits build).
(Aside: For personal projects I often would like to keep track of each build output and corresponding source snapshot, but I've yet to find anything which is good at doing so.)

Is there a diff tool (patch) that is aware of indentation?

I'm regularly using the gnu-utils patch and diff. Using git, I often do:
git diff
Often simple changes create a large patch because the only that changed was, for example, adding a if/else loop and everything inside is indented to the right.
Reviewing such a patch can be cumbersome because only line by line manual comparison can indicate if anything has essentially changed within the indented code. We may be speaking about a few lines of code only, or about dozens (or much more) of nested code. (I know: such an hypothetically large function would better be split into smaller functions, but that's beside the point).
Can't GNU diff/patch be aware when the only change within a code block is the indentation and let the developer know as much?
Are there any other diff tools that operate this way?
Edit: Ok, there is --ignore-space-change but then we are in a either/or situation: either we have a human-more-readable patch or we have a complete patch that the machine would know how to read. Can't we have the best of both world with a more elaborate diff tool that would show to the human space changes for what they are while allowing the machine to apply the patch fully?
With GNU diff you can pass -b or --ignore-space-change to ignore changes in the amount of white space in a patch.
If you use emacs and have been sent a patch, you can also use M-x diff-ignore-whitespace-hunk to reformat the patch to ignore white space in a particular hunk. Or diff-refine-hunk to highlight changes at a character by character level, which tends to point out the "meat" of a change.
As for applying patches, you can use the -l or --ignore-whitespace with GNU patch to ignore tabs and spaces changes. Just be careful with Python code :-)
For what is worth, using git difftool with a tool like meld or xxdiff makes the diff much more readable.
I don't know about git diff. But a diff-like tool that understands not just indentation but in fact any layout changes in your target language is our Smart Differencer.
This tool parses the before- and after- versions of your code the same way compiler does, and compares the resulting syntax trees, so it isn't affected by whitespace changes (except semantically important whitespace such as Python indentation) of any kind, inserted or deleted comments, or even change of radix on constants.
The result is report in terms of programmer editing actions ("move, insert, delete, copy, rename") over language structures (expressions, statements, declarations, blocks, methods, ...) rather than "insert line" or "delete line".
I try to not do file-wide indentation changes in the same commit as some other changes. And I commit the indentation changes in a separate commit before or after, with a commit message of "Changed indentation only.", to make it clear so that no manual diff inspection is needed, to see if something else was changed.

Code formatting and source control diffs

What source control products have a "diff" facility that ignores white space, braces, etc., in calculating the difference between checked-in versions? I seem to remember that Clearcase's diff did this but Visual SourceSafe (or at least the version I used) did not.
The reason I ask is probably pretty typical. Four perfectly reasonable developers on a team have four entirely different ways of formatting their code. Upon checking out the code last changed by someone else, each will immediately run some kind of program or editor macro to format things the way they like. They make actual code changes. They check-in their changes. They go on vacation. Two days later that program, which had been running fine for two years, blows up. The developer assigned to the bug does a diff between versions and finds 204 differences, only 3 of which are of any significance, because the diff algorithm is lame.
Yes, you can have coding standards. Most everyone finds them dreadful. A solution where everyone can have their cake and eat it too seems far more preferable.
=========
EDIT: Thanks to everyone for some great suggestions.
What I take away from this is:
(1) A source control system with plug-in type diffs is preferable.
(2) Find a diff with suitable options.
(3) Use a good source formatting program and settle on a check-in standard.
Sounds like a plan. Thanks again.
Git does have these options:
--ignore-space-at-eol
Ignore changes in whitespace at EOL.
-b, --ignore-space-change
Ignore changes in amount of whitespace. This ignores whitespace at line end, and considers all other sequences of one or more
whitespace characters to be equivalent.
-w, --ignore-all-space
Ignore whitespace when comparing lines. This ignores differences even if one line has whitespace where the other line has
none.
I am not sure if brace changes can be ignored using Git's diff.
If it is C/C++ code, you can define Astyle rules and then convert the source code's brace style to the one that you want, using Astyle. A git diff will then produce sane output.
Choose one (dreadful) coding standard, write it down in some official coding standards document, and get on with your life, messing with whitespace is not productive work.
And remember you are a professional developer, it's your job to get the project done, changing anything in the code because of a personal style preference hurts the project - it wont only make diff-ing more difficult, it can also introduce hard to find problems if your source formatter or compiler has bugs (and your fancy diff tool won't save you when two co-worker start fighting over casing).
And if someone just doesn't agree to work with the selected style just remind him (or her) that he is programming as a profession not as an hobby, see http://www.ericsink.com/entries/No_Great_Hackers.html
Maybe you should choose one format and run some indentation tool before checking in so that each person can check out, reformat to his/her own preferences, do the changes, reformat back to the official standard and then check in?
A couple of extra steps but they already use indentation tools when working. Maybe it can be a triggered check-in script?
Edit: this would perhaps also solve the brace problem.
(I haven't tried this solution myself, hence the "perhapes" and "maybes", but I have been in projects with the same problems, and it is a pain to try to go through diffs with hundreds of irrelevant changes that are not limited to whitespace, but includes the formatting itself.)
As explained in Is it possible for git-merge to ignore line-ending differences?, it is more a matter to associate the right diff tool to your favorite VCS, rather than to rely on the right VCS option (even if Git does have some options regarding whitespace, like the one mentioned in Alan's answer, it will always be not as complete as one would like).
DiffMerge is the more complete on those "ignore" options, as it can not only ignore spaces but also other "variations" based on the programming language used in a given file.
Subversion apparently supports this, either natively in the latest versions, or by using an alternate diff like Gnu Diff.
Beyond Compare does this (and much much more) and you can integrate it either in Subversion or Sourcesafe as an external diff tool.