Syntax highlighting on GitHub's Wiki: Specifying the programming language

Syntax highlighting on GitHub's Wiki: Specifying the programming language - github

GitHub uses something known as the "GitHub Flavored Markdown" for messages, issues and comments. My questions are:
Does GitHub also use this syntax for their Wiki?
From what I understand one can specify the programming language for syntax highlighting using the following syntax:
```ruby
require 'redcarpet'
markdown = Redcarpet.new("Hello World!")
puts markdown.to_html
```
Where one can specify the programming language after the ``` string (e.g. ```ruby)
My question is: How do I look up the specifier for a programming language? (e.g. C does not seem to work for the C programming language)

For a list of the possible lexers that github wiki can use see here: http://pygments.org/docs/lexers/
If you find that a certain lexer is not supported, github recommends forking their code and submitting it via a pull request: https://github.com/blog/774-git-powered-wikis-improved

Quoting GitHub's documentation on the subject:
We use Linguist to perform language detection and syntax highlighting. You can find out which keywords are valid in the languages YAML file.
Linguist's "Grammar index" may prove also useful.

How do I look up the specifier for a programming language?
The up-to-date list of language specifiers can be deduced from the main configuration file of the Linguist repository, languages.yml. For each language in that list, you can use as specifiers:
The language name
Any of the language aliases
Any of the file extensions, with or without a leading ..
White spaces must be replaced by dashes (e.g., emacs-lisp is one specifier for Emacs Lisp). Languages with a tm_scope: none entry don't have a grammar defined and won't be highlighted on github.com.
jmm made this reverse engineering effort and received confirmation from one of GitHub's engineers. He opened an issue with all the information on Linguist and maintains a wiki page with the specifiers for all languages (which might not be up-to-date).
Does GitHub also use this syntax for their Wiki?
Yes, but github.com's wikis also supports several other formats. You can find the complete list on the markup repository.

Related

How to write diff code with syntax highlight in Github

Github supports syntax highlight as follows:
```javascript
let message = 'hello world!'
```
And it supports diff as follows: (but WITHOUT syntax highlight)
```diff
-let message = 'hello world!'
+let message = 'hello stackoverflow!'
```
How can I get both 'syntax hightlight' AND 'diff' ?

No, this is not a supported feature at this time.
GitHub documents their processing of lightweight markup languages (including Markdown, among others) in github/markup. Note step 3:
Syntax highlighting is performed on code blocks. See github/linguist for more information about syntax highlighting.
If we follow that link, we find a list of grammars that Linguist uses to provide syntax highlighting on GitHub. Linguist can only apply one of the grammars in that list to a block of code at a time. Of course, one of the grammars is Diff. However, that grammar knows nothing about the language of code being diffed, so you don't get syntax highlighting of that.
Of course, there are other languages which are often combined. For example, HTML is often included in a templating language. Therefore, in addition to the HTML grammar, we also find grammars for HTML+Django, HTML+ECR HTML+EEX, HTML+ERB, and HTML+PHP. In each case, the single grammar is aware of two languages. Both the specific templating language and the HTML which is interspersed within the template.
To accomplish the same thing with a diff, you would need a separate "diff" grammar for every single language listed. In other words, the number of grammars would double. Of course, a way to avoid this might be to treat diff differently. When diff is specified, they could run the block through the syntax highlighter twice, once for diff and once for the source language. However, at least when processing code blocks in lightweight markup languages, they have not implemented such a feature.
And if they ever were to implement such a feature in the future, it would likely be more complicated that simply running the code block through twice. After all, every line of the diff has diff specific content which would confuse the other language grammar. Therefore, every grammar would need to be diff aware, or each line would need to be fed to the grammar separately with the diff parts removed. The problem with the later is that the grammar would not have the context of each line and is more likely to get things wrong. Whether such a solution is possible is outside this cope of this answer, but the point is that it is reasonable to expect that such a feature would be much lower priority to support due to the complexity involved.
So why does GitHub do syntax highlighting in other places on its website? Because, in those cases, it has access to the two source files being diffed and it generates the diff itself. Each source is first highlighted (avoiding the complexity mentioned above), then the diff is created from the two highlighted source files. However, a diff included in a Markdown code block is already a diff when GitHub first sees it. There is no way for them to highlight the pre-diff code first. In other words, the process they currently use would not be transferable to supporting the requested feature.

You would need to post-process the output of the git diff in order to add syntax highlighting for the right language of the file being diff'ed.
But since you are asking for GitHub, that post-processing is not in your control, and is not provided by GitHub at the moment in its GFM (GitHub Flavored Markdown Spec).
It is supported for source files, in a regular diff like this one or in a PR: GitHub does the syntax highlighting of the two versions of the file, and then computes the diff.
It is not supported in a regular markdown fenced code block, where the +/- of a diff would throw off the syntax highlighting engine, considering there is no "diff" operation done here (just the writer trying to add diff +/- symbols)

Why doesn't GitHub like the % character?

I have a GitHub repo with multiple C source files. (I won't share a link unless absolutely necessary so that I can't be accused of advertising.) Every instance of the % character in the C files is highlighted red:
Am I missing something about % in C, is this a bug, or is it intentional?

GitHub uses linguist for detecting languages, and some highlighting issues can be found there (even if it does not directly concern the language detection module)
See issue 2839 which does mention
We use open source TextMate-style language grammars for syntax highlighting, which are available here:
https://github.com/github/linguist/blob/master/grammars.yml
Linguist pulls in grammar updates with each new release, which usually happens every couple of weeks.
For C, is is textmate/c.tmbundle, which had a percent-related highlighting issue before (issue 28): you might have to open a new issue there.

I found this discussion has a plausible explanation. Here I quote:
It's highlighting the % because it's assuming that you're making a printf format string, and that you've made it wrong. Unfortunately there's no way to tell it it's not a printf format string short of changing the syntax file.

Cite a paper using github markdown syntax

I am coding a readme for a repo in github, and I want to add a reference to a paper. What is the most adequate way to code in the citation? e.g. As a blockquote, as code, as simple text, etc?
Suggestions?

I agree with Horizon_Net that it depends on personal preference.
I like to have something which looks similar to LaTeX.
An example is provided below.
Note that it demonstrates a numeric citation style.
Alphabetic or reading style are possible too.
For numeric citation style, higher numbers should appear later in the text, and this can make satisfying numeric citation style cumbersome.
To avoid this problem, I typically use an alphabetic citation style.
"...the **go to** statement should be abolished..." [[1]](#1).
## References
<a id="1">[1]</a>
Dijkstra, E. W. (1968).
Go to statement considered harmful.
Communications of the ACM, 11(3), 147-148.
"...the go to statement should be abolished..." [1].
References
[1]
Dijkstra, E. W. (1968).
Go to statement considered harmful.
Communications of the ACM, 11(3), 147-148.
On GitHub flavored Markdown and most other Markdown flavors, you can actually click on [1] to jump to the reference.
Apologies for taking Dijkstra his sentence out of context.
The full sentence would make this example more difficult to read.
EDIT:
If the references all have a stable link, it is also possible to use those:
The field of natural language processing (NLP) has become mostly dominated by deep learning approaches
(Young et al., [2018](https://doi.org/10.1109/MCI.2018.2840738)).
Some are based on transformer neural networks
(e.g., Devlin et al, [2018](https://arxiv.org/abs/1810.04805)).
The field of natural language processing (NLP) has become mostly dominated by deep learning approaches
(Young et al., 2018).
Some are based on transformer neural networks
(e.g., Devlin et al, 2018).

From what I know, there is no built-in mechanism for this. This leads to more subjective opinions, depending on personal preference. I personally like to have a separate section called references. An example would look like the following
References
some reference
another reference
Update
Another way would be to use simple HTML embedded in your Markdown.

The alternative (to pure markdown) is to use the new GitHub integration from Aug. 2021:
Enhanced support for citations on GitHub
GitHub now has built-in support for CITATION.cff files.
This new feature enables academics and researchers to let people know how to correctly cite their work, especially in academic publications/materials.
Originally proposed by the research software engineering community, CITATION.cff files are plain text files with human- and machine-readable citation information.
When we detect a CITATION.cff file in a repository, we use this information to create convenient APA or BibTeX style citation links that can be referenced by others.
How this works
Under the hood, we’re using the ruby-cff RubyGem to parse the contents of the CITATION.cff file and build a citation string that is then shown in GitHub when someone browses a repository with one of those files1.
Now, that is helping others making your Github paper easily citable.
But, as documented, to "add a reference to a paper":
Citing something other than software
If you would prefer the GitHub citation information to link to another resource such as a research article, then you can use the preferred-citation override in CFF with the following types.
Resource
Type
Research article
article
Conference paper
conference-paper
Book
book
Extract
preferred-citation:
type: article
authors:
- family-names: "Lisa"
given-names: "Mona"
orcid: "https://orcid.org/0000-0000-0000-0000"

Xtext: grammar for language with significant/semantic whitespace

How can I use Xtext to parse languages with semantic whitespace? I'm trying to write a grammar for CoffeeScript and I can't find any good documentation on this.

Here's an example whitespace sensitive language in XText

AFAIK, you can't.
In case of parsing Python-like languages, you'd need the lexer to emit INDENT and DEDENT tokens. For that to happen, you'd need semantic predicates to be supported inside lexer rules (Xtext's terminal rules) that would first check if the current-position-in-line of the next character int the input equals 0 (the beginning of the line) and is a ' ' or '\t'.
But browsing through the documentation, I don't see this is supported by Xtext at the moment. Since Xtext 2.0, support has been added for semantic predicates in production rules (see: 6.2.8. Syntactic Predicates), but not in terminal rules.
The only way to do this with Xtext would be to let the lexer produce terminal spaces and line-breaks, but this would make an utter mess of your production rules.
If you want to parse such a language using Java (and a Java oriented parser generator) I'd recommend ANTLR, in which you can emit such INDENT and DEDENT tokens quite easily. But if you're keen on Eclipse integration, then I don't see how you'd be able to do this using Xtext, sorry.

Version 2.8 of Xtext comes with support for Whitespace-Aware Languages. This version ships with the "Home Automation Example" that you can use as a template.

For people interested in CoffeeScript, Adam Schmideg has an Eclipse plugin that uses XText.
For people interested in parsing Python-like DSL's in XText, Ralf Ebert's code for Todotext mentioned above is no longer available from Github but you can find it in the Eclipse test repository. See the original thread about this work and the Eclipse issue that was raised about it.
I have been playing with this code today and my conclusion is it no longer works in the current version of XText. When XText is used in Eclipse, I think it does "partial parsing". This is not compatible with the stateful lexer you need to process indentation sensative languages. So I suspect even if you patch the lexer, the Eclipse editor does not work. In the issue, it looks like Ralf proposed patches to address these issues, but looking into the XText source, these changes seem long gone? If I am wrong and someone can get it to work, I would be very interested?
There is a different implementation here but I cannot get that to work with the current version of XText either.
Instead I have switched to parboiled which does supports indentation based grammars out the box.

What are some useful emacs functions for refactoring?

For now I've stuck with multi-occur-in-matching-buffers and rgrep, which, while powerful, is still pretty basic I guess.
Eventhough I realize anything more involved than matching a regexp and renaming will need to integrate with CEDET's semantic bovinator, I feel like there is still room for improvement here.
Built-in functions, packages, or custom-code what do you find helpful getting the job done ?
Cheers

In CEDET, there is a symbol reference tool. By default it also uses find/grep in a project to find occurrence of a symbol. It is better to use GNU Global, IDUtils, or CScope instead to create a database in your project. You can then use semantic-symref-symbol which will then use gnu global or whatever to find all the references.
Once in symref list buffer, you can look through the hits. You can then select various hits and perform operations such as symbol rename, or the more powerful, execute macro on all the hits.
While there are more focused commands that could be made, the macro feature allows almost anything to happen for the expert user who understands Emacs keyboard macros well.

It depends on which language you are using; if your language is supported by slime, there are the family of who commands: slime-who-calls, who-references, who-binds, calls-who, etc. They provide real, semantic based information, so are more reliable than regexp matching.

If you're editing lisp, I've found it useful (in general) to use the paredit.el package. Follow the link for documentation, and the video is a great introduction.

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse