How to write diff code with syntax highlight in Github - github

Github supports syntax highlight as follows:
```javascript
let message = 'hello world!'
```
And it supports diff as follows: (but WITHOUT syntax highlight)
```diff
-let message = 'hello world!'
+let message = 'hello stackoverflow!'
```
How can I get both 'syntax hightlight' AND 'diff' ?

No, this is not a supported feature at this time.
GitHub documents their processing of lightweight markup languages (including Markdown, among others) in github/markup. Note step 3:
Syntax highlighting is performed on code blocks. See github/linguist for more information about syntax highlighting.
If we follow that link, we find a list of grammars that Linguist uses to provide syntax highlighting on GitHub. Linguist can only apply one of the grammars in that list to a block of code at a time. Of course, one of the grammars is Diff. However, that grammar knows nothing about the language of code being diffed, so you don't get syntax highlighting of that.
Of course, there are other languages which are often combined. For example, HTML is often included in a templating language. Therefore, in addition to the HTML grammar, we also find grammars for HTML+Django, HTML+ECR HTML+EEX, HTML+ERB, and HTML+PHP. In each case, the single grammar is aware of two languages. Both the specific templating language and the HTML which is interspersed within the template.
To accomplish the same thing with a diff, you would need a separate "diff" grammar for every single language listed. In other words, the number of grammars would double. Of course, a way to avoid this might be to treat diff differently. When diff is specified, they could run the block through the syntax highlighter twice, once for diff and once for the source language. However, at least when processing code blocks in lightweight markup languages, they have not implemented such a feature.
And if they ever were to implement such a feature in the future, it would likely be more complicated that simply running the code block through twice. After all, every line of the diff has diff specific content which would confuse the other language grammar. Therefore, every grammar would need to be diff aware, or each line would need to be fed to the grammar separately with the diff parts removed. The problem with the later is that the grammar would not have the context of each line and is more likely to get things wrong. Whether such a solution is possible is outside this cope of this answer, but the point is that it is reasonable to expect that such a feature would be much lower priority to support due to the complexity involved.
So why does GitHub do syntax highlighting in other places on its website? Because, in those cases, it has access to the two source files being diffed and it generates the diff itself. Each source is first highlighted (avoiding the complexity mentioned above), then the diff is created from the two highlighted source files. However, a diff included in a Markdown code block is already a diff when GitHub first sees it. There is no way for them to highlight the pre-diff code first. In other words, the process they currently use would not be transferable to supporting the requested feature.

You would need to post-process the output of the git diff in order to add syntax highlighting for the right language of the file being diff'ed.
But since you are asking for GitHub, that post-processing is not in your control, and is not provided by GitHub at the moment in its GFM (GitHub Flavored Markdown Spec).
It is supported for source files, in a regular diff like this one or in a PR: GitHub does the syntax highlighting of the two versions of the file, and then computes the diff.
It is not supported in a regular markdown fenced code block, where the +/- of a diff would throw off the syntax highlighting engine, considering there is no "diff" operation done here (just the writer trying to add diff +/- symbols)

Related

Context dependent syntax highlighting in vs code, sublime or atom

I'm developing a language to write down mathematical proofs for proof checking. (To be more accurate: it is a language to write down derivations in Post canonical systems)
Is it possible to create syntax highlighting for VS Code, Sublime Text or Atom which will do something like this:
So I'm asking for context dependent syntax highlighting. Only the "declared" variables should be highlighted (until the next variable declaration).
Of course, with a custom lexer in Pygments it is possible. But the TextMate-like grammars seem to be too limited. If this is the case: Is there a code editor that can use something like Pygments to highlight code?
Remark:
It isn't important what happens if one variable is an initial segment of another.
The semantical content of the example is nonsense, of course.

Syntax highlighting on GitHub's Wiki: Specifying the programming language

GitHub uses something known as the "GitHub Flavored Markdown" for messages, issues and comments. My questions are:
Does GitHub also use this syntax for their Wiki?
From what I understand one can specify the programming language for syntax highlighting using the following syntax:
```ruby
require 'redcarpet'
markdown = Redcarpet.new("Hello World!")
puts markdown.to_html
```
Where one can specify the programming language after the ``` string (e.g. ```ruby)
My question is: How do I look up the specifier for a programming language? (e.g. C does not seem to work for the C programming language)
For a list of the possible lexers that github wiki can use see here: http://pygments.org/docs/lexers/
If you find that a certain lexer is not supported, github recommends forking their code and submitting it via a pull request: https://github.com/blog/774-git-powered-wikis-improved
Quoting GitHub's documentation on the subject:
We use Linguist to perform language detection and syntax highlighting. You can find out which keywords are valid in the languages YAML file.
Linguist's "Grammar index" may prove also useful.
How do I look up the specifier for a programming language?
The up-to-date list of language specifiers can be deduced from the main configuration file of the Linguist repository, languages.yml. For each language in that list, you can use as specifiers:
The language name
Any of the language aliases
Any of the file extensions, with or without a leading ..
White spaces must be replaced by dashes (e.g., emacs-lisp is one specifier for Emacs Lisp). Languages with a tm_scope: none entry don't have a grammar defined and won't be highlighted on github.com.
jmm made this reverse engineering effort and received confirmation from one of GitHub's engineers. He opened an issue with all the information on Linguist and maintains a wiki page with the specifiers for all languages (which might not be up-to-date).
Does GitHub also use this syntax for their Wiki?
Yes, but github.com's wikis also supports several other formats. You can find the complete list on the markup repository.

Is there a diff tool (patch) that is aware of indentation?

I'm regularly using the gnu-utils patch and diff. Using git, I often do:
git diff
Often simple changes create a large patch because the only that changed was, for example, adding a if/else loop and everything inside is indented to the right.
Reviewing such a patch can be cumbersome because only line by line manual comparison can indicate if anything has essentially changed within the indented code. We may be speaking about a few lines of code only, or about dozens (or much more) of nested code. (I know: such an hypothetically large function would better be split into smaller functions, but that's beside the point).
Can't GNU diff/patch be aware when the only change within a code block is the indentation and let the developer know as much?
Are there any other diff tools that operate this way?
Edit: Ok, there is --ignore-space-change but then we are in a either/or situation: either we have a human-more-readable patch or we have a complete patch that the machine would know how to read. Can't we have the best of both world with a more elaborate diff tool that would show to the human space changes for what they are while allowing the machine to apply the patch fully?
With GNU diff you can pass -b or --ignore-space-change to ignore changes in the amount of white space in a patch.
If you use emacs and have been sent a patch, you can also use M-x diff-ignore-whitespace-hunk to reformat the patch to ignore white space in a particular hunk. Or diff-refine-hunk to highlight changes at a character by character level, which tends to point out the "meat" of a change.
As for applying patches, you can use the -l or --ignore-whitespace with GNU patch to ignore tabs and spaces changes. Just be careful with Python code :-)
For what is worth, using git difftool with a tool like meld or xxdiff makes the diff much more readable.
I don't know about git diff. But a diff-like tool that understands not just indentation but in fact any layout changes in your target language is our Smart Differencer.
This tool parses the before- and after- versions of your code the same way compiler does, and compares the resulting syntax trees, so it isn't affected by whitespace changes (except semantically important whitespace such as Python indentation) of any kind, inserted or deleted comments, or even change of radix on constants.
The result is report in terms of programmer editing actions ("move, insert, delete, copy, rename") over language structures (expressions, statements, declarations, blocks, methods, ...) rather than "insert line" or "delete line".
I try to not do file-wide indentation changes in the same commit as some other changes. And I commit the indentation changes in a separate commit before or after, with a commit message of "Changed indentation only.", to make it clear so that no manual diff inspection is needed, to see if something else was changed.

Code formatting and source control diffs

What source control products have a "diff" facility that ignores white space, braces, etc., in calculating the difference between checked-in versions? I seem to remember that Clearcase's diff did this but Visual SourceSafe (or at least the version I used) did not.
The reason I ask is probably pretty typical. Four perfectly reasonable developers on a team have four entirely different ways of formatting their code. Upon checking out the code last changed by someone else, each will immediately run some kind of program or editor macro to format things the way they like. They make actual code changes. They check-in their changes. They go on vacation. Two days later that program, which had been running fine for two years, blows up. The developer assigned to the bug does a diff between versions and finds 204 differences, only 3 of which are of any significance, because the diff algorithm is lame.
Yes, you can have coding standards. Most everyone finds them dreadful. A solution where everyone can have their cake and eat it too seems far more preferable.
=========
EDIT: Thanks to everyone for some great suggestions.
What I take away from this is:
(1) A source control system with plug-in type diffs is preferable.
(2) Find a diff with suitable options.
(3) Use a good source formatting program and settle on a check-in standard.
Sounds like a plan. Thanks again.
Git does have these options:
--ignore-space-at-eol
Ignore changes in whitespace at EOL.
-b, --ignore-space-change
Ignore changes in amount of whitespace. This ignores whitespace at line end, and considers all other sequences of one or more
whitespace characters to be equivalent.
-w, --ignore-all-space
Ignore whitespace when comparing lines. This ignores differences even if one line has whitespace where the other line has
none.
I am not sure if brace changes can be ignored using Git's diff.
If it is C/C++ code, you can define Astyle rules and then convert the source code's brace style to the one that you want, using Astyle. A git diff will then produce sane output.
Choose one (dreadful) coding standard, write it down in some official coding standards document, and get on with your life, messing with whitespace is not productive work.
And remember you are a professional developer, it's your job to get the project done, changing anything in the code because of a personal style preference hurts the project - it wont only make diff-ing more difficult, it can also introduce hard to find problems if your source formatter or compiler has bugs (and your fancy diff tool won't save you when two co-worker start fighting over casing).
And if someone just doesn't agree to work with the selected style just remind him (or her) that he is programming as a profession not as an hobby, see http://www.ericsink.com/entries/No_Great_Hackers.html
Maybe you should choose one format and run some indentation tool before checking in so that each person can check out, reformat to his/her own preferences, do the changes, reformat back to the official standard and then check in?
A couple of extra steps but they already use indentation tools when working. Maybe it can be a triggered check-in script?
Edit: this would perhaps also solve the brace problem.
(I haven't tried this solution myself, hence the "perhapes" and "maybes", but I have been in projects with the same problems, and it is a pain to try to go through diffs with hundreds of irrelevant changes that are not limited to whitespace, but includes the formatting itself.)
As explained in Is it possible for git-merge to ignore line-ending differences?, it is more a matter to associate the right diff tool to your favorite VCS, rather than to rely on the right VCS option (even if Git does have some options regarding whitespace, like the one mentioned in Alan's answer, it will always be not as complete as one would like).
DiffMerge is the more complete on those "ignore" options, as it can not only ignore spaces but also other "variations" based on the programming language used in a given file.
Subversion apparently supports this, either natively in the latest versions, or by using an alternate diff like Gnu Diff.
Beyond Compare does this (and much much more) and you can integrate it either in Subversion or Sourcesafe as an external diff tool.

How well does Python's whitespace dependency interact with source control with regards to merging?

I'm wondering if the need to alter the indentation of code to adjust the nesting has any adverse effects on merging changes in a system like SVN.
I've used python with SVN and Mercurial, and have no hassles merging.
It all depends on how the diffing is done - and I suspect that it is character-by-character, which would notice the difference between one level of indent and another.
It works fine so long as everyone on the project has agreed to use the same whitespace style (spaces or tabs).
But I've seen cases where a developer has converted an entire file from spaces to tabs (I think Eclipse had that as a feature, bound to Ctrl+Tab!), which makes spotting diffs near impossible.
Generally source control systems merge on a line-by-line basis by default. I have found that merging Python code is no different from merging any other source code that is reasonably indented. If one programmer wraps a block of code in an if statement (indenting the whole block), and another programmer modifies something inside the block, then there will be a merge conflict. Fortunately, the conflict in this case is super easy to resolve.
If you use an external merge tool, then your tool may support more detailed textual merging algorithms that take the above scenario into account automatically.