Is there any diff/merge tool for programming languages, that works in a syntax-aware way (like XML Diff Tool), doing more than compare line-by-line (and optionally ignoring whitespace).
I'm interested in a program actually following the language syntax and delimeters, suggesting changes without breaking syntactic correctness, or bundling statements separated over multiple lines. Example behavior would be:
*upon finding an if(){ which introduces an extra nesting level automatically bundle the closing brace } several lines below with it.)
*keep matching syntax elements together, avoid silliness like removing a block tends to create:
int function_A()
{
int ret;
ret = something;
ret += something_else;
return ret;
}
int function_B()
{
if(valid)
{
int ret;
ret = something;
ret += something_else;
return ret;
}
else return -1;
}
Personally, I'd love to find software capable of handling C++ syntax, but knowing about solutions for other languages would be interesting too.
Semantic Merge.
Languages supported, from the website:
We started with C# and Vb.net, then added Java. Now C is already supported and then we’ll focus on C++, Objective-C and JavaScript, depending on your
feedback
While KDiff3 does not compare syntax elements in a grammar context, it does have a higher granularity than "the whole line changed", and it will highlighting exactly what parts within a line that is changed.
And in my experience it has a very good algorithm for detecting changes. Given your example above, it correctly compares function_A and function_B out of the box:
And even so, should the algorithm fail to match what you want, for instance like the following:
you can always override manually by placing sync marks where you want to have it perform the comparision.
Alternative 1:
Alternative 2:
Sounds like you'd be interested in Bram Cohen's (BitTorrent creator) Patience Diff algorithm (which is used in the bazaar version control system).
See The diff problem has been solved and especially Patience Diff Advantages:
Excerpt from second link:
Another advantage of patience diff is that it frequently doesn't match lines which just plain shouldn't match. For example, if you've completely rewritten a section of code it shouldn't match up the blank lines in each version, as this example shows. Finally, there's this example:
void func1() {
x += 1
}
+void functhreehalves() {
+ x += 1.5
+}
+
void func2() {
x += 2
}
Which is straightforward and obvious, but frequently diff algorithms will interpret it like this:
void func1() {
x += 1
+}
+
+void functhreehalves() {
+ x += 1.5
}
void func2() {
x += 2
}
Beyond Compare does some of what you're asking. It doesn't maintain syntactical correctness or compare language blocks at a time, but it can do the following:
Some understanding of language syntax, so it can do syntax highlighting of compared files, and it can also recognize and optionally ignore unimportant differences (like comments, including multiline comments).
Support for using external conversion programs for loading and saving data. Out of the box, it supports using this to prettify XML and HTML before comparing it. You could set up GNU Indent to standardize syntax before comparing two C files.
Optional line weights to let you give a higher weight to matching, e.g., closing braces. I've not tried this feature.
Replacements, to ignore for a single session every place where old_variable_name on the left was replaced with new_variable_name on the right.
It's by far the best diff-and-merge tool that I've used. It's also cross platform, cheap ($30 for standard, $50 for pro), and has a very generous evaluation period, so it's worth a try.
See our SmartDifferencer tools.
SmartDifferencers are language specific, driven by production quality language parsers, build ASTs, and compare the trees. This makes them completely indepedent of text layout and intervening comments; remarkably, they are immune to changes in the text of literals (radix, move decimal point+change exponent, different escape sequences) if the actual value represented by the literal isn't different. The result is reported in language syntax terms, and plausible editing actions (move, copy, insert, delete, rename-identifier-within-block).
There are versions for C#, Java, C++, Python, and a variety of other languages. There are examples of each of these at the website.
A SmartDifferencer exists for C, but parsing C files without the full compiler command line is sometimes problematic, so sometimes it fails and you have to fall back to more primitive compare tools, like diff. We are working to improve this situation.
Please look at Compare++.
It can do language-aware structured comparison for C/C++, Java, C#, Javascript, CSS, ...
and Optionally ignore comment, pure formatted, white-space and case changes and have unique ability to align moved sections such as C++ function, Java namespace, C# method, CSS selector, ...
If you are using eclipse, the integrated compare editor provides syntax aware diff/merge, at least for Java. Check "Open Structure Compare automatically" under the "General/Compare/Patch" preferences, then choose "Java Structure Compare" in the compare editor.
Look at https://en.wikipedia.org/wiki/Comparison_of_file_comparison_tools especially column Structured comparison.
Currently there are only two tools who understand language structure.
Compare++ (Works great for C++)
Pretty Diff (Language aware code comparison tool for several web based languages. It also beautifies, minifies, and a few other things..)
Unfortunately many tools have this column still empty.
Related
I'm trying to get a "retro-computing" class open and would like to give people the opportunity to finish projects at home (without carrying a 3kb monstrosity out of 1980 with them) I've heard that repl.it has every programming language, does it have QuickBasic and how do I use it online? Thanks for the help in advance!
You can do it (hint: search for QBasic; it shares syntax with QuickBASIC), but you should be aware that it has some limitations as it's running on an incomplete JavaScript implementation. For completeness, I'll reproduce the info from the original blog post:
What works
Only text mode is supported. The most common commands (enough to run
nibbles) are implemented. These include:
Subs and functions
Arrays
User types
Shared variables
Loops
Input from screen
What doesn't work
Graphics modes are not supported
No statements are allowed on the same line as IF/THEN
Line numbers are not supported
Only the built-in functions used by NIBBLES.BAS are implemented
All subroutines and functions must be declared using DECLARE
This is far from being done. In the comments, AC0KG points out that
P=1-1 doesn't work.
In short, it would need another 50 or 100 hours of work and there is
no reason to do this.
One caveat that I haven't been able to determine is a statement like INPUT or LINE INPUT... They just don't seem to work for me on repl.it, and I don't know where else one might find qb.js hosted.
My recommendation: FreeBASIC
I would recommend FreeBASIC instead, if possible. It's essentially a modern reimplementation coded in C++ (last I knew) with additional functionality.
Old DOS stuff like the DEF SEG statement and VARSEG function are no longer applicable since it is a modern BASIC implementation operating on a 32-bit flat address space rather than 16-bit segmented memory. I'm not sure what the difference between the old SADD function and the new StrPtr function is, if there is any, but the idea is the same: return the address of the bytes that make up a string.
You could also disable some stuff and maintain QB compatibility using #lang "qb" as the first line of a program as there will be noticeable differences when using the default "fb" dialect, or you could embrace the new features and avoid the "qb" dialect, focusing primarily on the programming concepts instead; the choice is yours. Regardless of the dialect you choose, the basic stuff should work just fine:
DECLARE SUB collatz ()
DIM SHARED n AS INTEGER
INPUT "Enter a value for n: ", n
PRINT n
DO WHILE n <> 4
collatz
PRINT n
LOOP
PRINT 2
PRINT 1
SUB collatz
IF n MOD 2 = 1 THEN
n = 3 * n + 1
ELSE
n = n \ 2
END IF
END SUB
A word about QB64
One might argue that there is a much more compatible transpiler known as QB64 (except for some things like DEF FN...), but I cannot recommend it if you want a tool for students to use. It's a large download for Windows users, and its syntax checking can be a bit poor at times, to the point that you might see the QB code compile only to see a cryptic message like "C++ compilation failed! See internals\temp\compile.txt for details". Simply put, it's usable and highly compatible, but it needs some work, like the qb.js script that repl.it uses.
An alternative: DOSBox and autorun
You could also find a way to run an actual copy of QB 4.5 in something like DOSBox and simply modify the autorun information in the default DOSBox.conf (or whatever it's called) to automatically launch QB. Then just repackage it with the modified DOSBox.conf in a nice installer for easy distribution (NSIS, Inno Setup, etc.) This will provide the most retro experience beyond something like a FreeDOS virtual machine as you'll be dealing with the 16-bit segmented memory, VGA, etc.—all emulated of course.
Github supports syntax highlight as follows:
```javascript
let message = 'hello world!'
```
And it supports diff as follows: (but WITHOUT syntax highlight)
```diff
-let message = 'hello world!'
+let message = 'hello stackoverflow!'
```
How can I get both 'syntax hightlight' AND 'diff' ?
No, this is not a supported feature at this time.
GitHub documents their processing of lightweight markup languages (including Markdown, among others) in github/markup. Note step 3:
Syntax highlighting is performed on code blocks. See github/linguist for more information about syntax highlighting.
If we follow that link, we find a list of grammars that Linguist uses to provide syntax highlighting on GitHub. Linguist can only apply one of the grammars in that list to a block of code at a time. Of course, one of the grammars is Diff. However, that grammar knows nothing about the language of code being diffed, so you don't get syntax highlighting of that.
Of course, there are other languages which are often combined. For example, HTML is often included in a templating language. Therefore, in addition to the HTML grammar, we also find grammars for HTML+Django, HTML+ECR HTML+EEX, HTML+ERB, and HTML+PHP. In each case, the single grammar is aware of two languages. Both the specific templating language and the HTML which is interspersed within the template.
To accomplish the same thing with a diff, you would need a separate "diff" grammar for every single language listed. In other words, the number of grammars would double. Of course, a way to avoid this might be to treat diff differently. When diff is specified, they could run the block through the syntax highlighter twice, once for diff and once for the source language. However, at least when processing code blocks in lightweight markup languages, they have not implemented such a feature.
And if they ever were to implement such a feature in the future, it would likely be more complicated that simply running the code block through twice. After all, every line of the diff has diff specific content which would confuse the other language grammar. Therefore, every grammar would need to be diff aware, or each line would need to be fed to the grammar separately with the diff parts removed. The problem with the later is that the grammar would not have the context of each line and is more likely to get things wrong. Whether such a solution is possible is outside this cope of this answer, but the point is that it is reasonable to expect that such a feature would be much lower priority to support due to the complexity involved.
So why does GitHub do syntax highlighting in other places on its website? Because, in those cases, it has access to the two source files being diffed and it generates the diff itself. Each source is first highlighted (avoiding the complexity mentioned above), then the diff is created from the two highlighted source files. However, a diff included in a Markdown code block is already a diff when GitHub first sees it. There is no way for them to highlight the pre-diff code first. In other words, the process they currently use would not be transferable to supporting the requested feature.
You would need to post-process the output of the git diff in order to add syntax highlighting for the right language of the file being diff'ed.
But since you are asking for GitHub, that post-processing is not in your control, and is not provided by GitHub at the moment in its GFM (GitHub Flavored Markdown Spec).
It is supported for source files, in a regular diff like this one or in a PR: GitHub does the syntax highlighting of the two versions of the file, and then computes the diff.
It is not supported in a regular markdown fenced code block, where the +/- of a diff would throw off the syntax highlighting engine, considering there is no "diff" operation done here (just the writer trying to add diff +/- symbols)
My question is partly liguistic, but very related to programming (of almost anything, web pages or anything else).
I would like to know why word refactor was chosen for changing of program or its part, if else word probably would be more exact and better describing done change.
IDEs (for example NetBeans or Eclipse) use this word only for renaming of any part of chosen program (project), including moving of file to else place (from view of any OS it is probably only renaming).
But renaming is not about changing of factor (because it is something that is not changed when it is renamed).
Closer to meaning of word refactor (as changing of factor) is manual rewriting of any part, when rewritten part has changed behaviour (but not what program does from outer view - as is written in topic What is refactoring and what is only modifying code?).
The word "Refactoring" is derived from mathematics where you find an equivalent expression by applying factoring again. The equivalent expression does not change the final outcome but it is much easier to understand, use, or reuse.
There are many refactoring techniques and renaming is one of them. Other techniques include extract method, extract class, move method, move class, pull/push method to super/sub-class and many more.
When using the compare tool, Eclipse shows this
void foo() {
//...
}
as different from this
void foo()
{
//...
}
which while technically correct is annoying when comparing two versions of files that have different formatting. Is there a way to apply the current formatting to the compare view? Even if it's a different style than either of the two things being compared it would at least give a nice base for finding the "actual" differences in the code.
The only plugin I have found is this but it doesn't work with my Eclipse (Luna), probably because it was made for a much older version.
As an aside, another useful thing and perhaps easier would be to ignore new line characters and tabs, of course this would show
foobar
and
foo
bar
as the same but it's better than nothing.
When working on different projects, with different people and using different frameworks you often struggle to keep your code compliant to their conventions. Some teams get very strict about naming variables/methods/classes and other things the others make holy wars around the topic. I understand them and I fully support, but as any developer I have my own preference I wish I could code with comfortably. This makes me think whether there is a simple solution.
Are there any tools or editors that can automatically convert code to follow a different standard? I imagine there can be no such smart tool that will support naming conversions, so I'm ok with that, but I really wish to see
foreach($lala as $lalala) {
and not
foreach($lala as $lalala)
{
same goes with statements:
if(I_LIEK_COOKIES) {
eat_cookie();
} else {
toss_cookie();
}
and not
if ( I_LIEK_COOKIES ) {
eat_cookie();
}
else
{
toss_cookie();
}
(note the spaces between and around the parenthesis too)
I won't even mention spaces/tabs, I can convert it in my IDE with a shortcut but it would be awesome.
So the things I would like to get customized are
spaces between parenthesis
tabs/spaces and spaces per tab
mustache brackets on the end of the line or on the new line
always attach mustache brackets to any if/ifelse/else/for/foreach etc.
Some of the extras anyone would appreciate:
Line ending style
Delete extra spaces on the line endings (like sublime text 2 can do on save, but would be great for other IDE/editors)
The perfect workflow would be like this:
I pull from git
The code gets converted to my style
I code stuff
I commit and push
Before everything gets pushed(or even commited) code gets converted to the convention style
Of course, someone may wish not to use git, then it would be simply converted when opening and after saving the file but as I understand it's impossible to do outside of an IDE/editor with a tool of some kind.
Has someone stumbled upon something like that? Could not find anything anywhere but tab/space conversion.
P.S. I wish to mention I'm working with PHP/JS so it's prioritized but I code using other languages on my spare time.
You could store configurations (e.g. vim .vimrcs, Eclipse preferences etc.) in each project's version control repository.
However, I think there's a big problem wrt. converting code when pushing/pulling to/from repositories. If someone reports an issue with your code (e.g. exception at line 100), converting the code when pulling from your repository is going to give you a different line 100. I don't think you can practically operate without working on the exact code that your compatriots are working with.