Can I configure Jupyter Notebook to split source files and generated files? - jupyter

I really like Jupyter Notebooks.
However, working with them is cumbersome in conjunction with a source control system like git, because an ipynb-File contains the source code (what you actually write in the notebook) and the generated output text / HTML / images / metadata / ...
For example, merge conflicts are difficult to resolve now, because everything is stored in one huge file with lots of generated data.
I wonder if I can configure Jupyter to store notebooks as
A source file: For example, I imagine this to be a Markdown file where everything surrounded by three backticks (```) is interpreted as a code cell. Diffs of that file would be meaningful and merge conflicts would be simple to resolve manually.
A generated file: This contains everything else. If there is a merge conflict within this file, it can be resolved by regenerating it.
Is this possible?

For reference: There is a slightly more general version of this question which lists various efforts at adapting IPython and Jupyter to this effect, and this answer proposes to solve the problem via Git. There is a Github project with a Git filter based on that answer, and (in its edit at the end) the answer links a few similar tools like nbstripout.

Related

Github incorrectly detects Languages of my project as "Roff"

In one of my repositories nearly all of my code is Python and some HTML.
However, Github thinks otherwise:
What causes that?
You were creating files through a script, with an unintended extension. That is, your script was inserting a dot in the file name.
Simply rename your file my_file_0.5ms to my_file_05ms.txt and it will display the correct languages:
What you could do to fix similar problems in the future is use a script to detect extensions and the total lines of code for each extension.
Solution
GitHub Linguist is the culprit in this situation, but luckily, it can be easily resolved in a number of ways.
Create a .gitattributes file and list patterns that match the files you want to ignore, and then append either linguist-vendored or linguist-documentation.
specific-file.5ms
*.5ms
specific-folder/*
This will remove the files from your GitHub repositories statistics on the next run of Linguist (it may take some time).
Notes
If you'd like to attribute these files to a specific language, you can do that using linguist-language={name}. Full documentation on overriding Linguist can be found here.
You can also run Linguist on your own computer, but note that any changes to .gitattributes will not take effect until you commit to your repository. Linguist will not see changes that exist only in the index.

Cond statement doxygen does not work

I am trying to separate out internal and external documentation using the doxygen constructs of cond; but i just cant seem to get get it working. I would essentially like to exclude some files completely and not conditionally. Regardless of where i add the tag (before include, before header guards etc) , the files and source both show up.
What i have tried in vain is to take the test file from doxegen repo for
conditional test and add it to the project.
Steps to reproduce [Linux]
create a new directory.
copy paste the above file (had to rename it to .h as .c was passed over?).
generate dummy config via doxygen -g.
update Doxyfile ENABLED_SECTION = COND_ENABLED.
Run doxygen.
check html/index.html
This however is still visible in the html documentation it generates for the project. I have set the ENABLED_SECTION variable with other values , but cond_enabled function still shows up. Running the testing directory of the project (doxygen) it passes. So i am lost.
Any suggestions?
Tried with latest version 1.8.14.
Thanks!
Regarding the \cond problems (not an answer directly to the real problem you face, I think, but to long for a comment).
The mentioned file is used in the, limited, testing doxygen can do / does and the first lines contain some instructions on what to do. Furthermore there is a default Doxyfile with the tests in use. It is hard to run a separate test outside the doxygen build tree.
Regarding the remark "Running the testing directory of the project (doxygen) it passes." This is correct, here, at the moment, only testing is done against the XML output and the generated output is compared to a once created version of the XML output. No tests are done, at the moment, in respect to HTML or PDF / LaTeX. Recently the test framework has been slightly extended so in the future this should be possible (compare the xhtml and tex output, but some work has still to be done here).
The version of the parser sees the \cond in the first line (normal C comment) as a doxygen command and skips everything till the first \endcond (your friend in these cases is always doxygen -d preprocessor). I think that removing / modifying the first line will result in an already better result. There is however another hiccup for e.g. HTML output. As the function cond_enabled is not documented and EXPAND_ALL is not set to YES the function will not appear in the documentation. So best is also to add a line of documentation with the function cond_enabled.
Regarding the seen HTML problems I modified the the relevant test in doxygen slightly and pushed a proposed patch to github (pull request 714, https://github.com/doxygen/doxygen/pull/714).
Note: the problem of skipping the \cond in normal C comment is quite a bit harder to implement (seen the logical complexity of the doxygen code in pre.l and commentcnv.l.
EDIT: 2018/06/10: The push request has been integrated in the master version on github.

IPython (Jupyter) MathJaX preamble

Question
How can I setup a MathJax "preamble" for use in IPython (or Jupyter) notebooks for repeated use in a way that is convenient for others to read my documents (on http://nbviewer.org) and that works for LaTeX/PDF generation?
Background
I would like to use IPython (now Jupyter) notebooks for documents that I later convert to PDF via LaTeX (using ipython nbconvert). The problem is how to include a bunch of macro definitions that I use in almost every document. Something like:
\newcommand{\vect}[1]{\vec{#1}}
\newcommand{\abs}[1]{\lvert#1\rvert}
\DeclareMathOperator{\erf}{erf}
etc. As far as the notebooks is concerned, one unsatisfactory solution is to simply include these in a markdown cell at the top of the notebook, embeded between two dollar signs $$ so it is interpreted as math. If this is done after some introductory text, then it does not even affect the output.
The problem is that, when converting to LaTeX (for PDF export), these commands are embedded in a math environment in the LaTeX file. This has several problems:
Commands like \DeclareMathOperator must come in the LaTeX document preamble.
Command definitions are local to the equation and not available later in the document. (This can be overcome by using \gdef or \global\def but then one must trick MathJax into recognising these commands with something like \let\gdef{\def} which is somehow hidden from LaTeX. Any way I have found of making this work amounts to an ugly hack.)
Sometimes commands are already defined in LaTeX and need to have \renewcommand (not supported by MathJax, but again can be provided by \let\renewcommand\newcommand etc. which seems reasonable to me since MathJax can't have some idea of what preamble might be used for the final LaTeX file).
Probably the solution is to provide a set of macros to MathJax by adding code like (not sure the equivalent of \DeclareMathOperator here...)
<script type="text/x-mathjax-config">
MathJax.Hub.Config({
TeX: {
Macros: {
vect: ["{\\vec #1}",1],
abs: ["{\\lvert #1 \\rvert}",1]
}
}
});
</script>
to a custom.js file and then providing a LaTeX package for inclusion when converting to PDF. The problem I have with this approach is: How to distribute the custom.js file and LaTeX style file for others (collaborators and viewers) to use?
I want collaborators to be able to edit and read my documents without having to install custom extensions in their global configuration. To be specifiec, I am fine with requiring them to run a command like python setup.py configure once they download/checkout my code which does local modifications to the project like populating ipython_notebook_config.py files in all directories containing notebooks, but am not happy installing extensions, or modifying their personal global custom.js file.
My stumbling block here is that I don't know how to add contributions from a local custom.js file to the notebook chain, and suspect that this might violate a security policy.
The best solution would not require any action on my collaborator's part.
I want my notebooks to work on http://nbviewer.org, and for people to be able to download the notebook and produce a PDF. (I think this rules out the possibility of using custom.js hacks and a distributed *.sty file, but am not certain.)
I would prefer to be able to simply start a new notebook and then start writing without having to insert a bunch of boilerplate code at the start of each notebook, though would be amenable to having a simple way of automating this process using an notebook extension or some hooks in python_notebook_config.py.
References
The following posts address some of these issues, but fall short on most fronts:
usepackage and making macros in ipython notebook
Physics bra-ket symbols in IPython (specifically this answer notes related difficulties)
How do I get MathJax to enable the mhchem extension in ipython notebook
Discussions about (potential) problems with the pandoc production of LaTeX files from IPython notebooks:
Getting some problems with pandoc and mathjax
\newcommand environment when convert from markdown to pandoc
Pandoc IPython notebook loses some Mathjax
General discussion of math in notebooks:
How to write LaTeX in IPython Notebook?
I think you can solve some of your problems, but not all.
First, the stumbling block. I believe (though I might be wrong) that nbviewer doesn't look at anything but the notebook itself. For example, I don't see how it could run an ipython_notebook_config.py stored alongside your notebook. So that rules out that line of thought, meaning that I think you'll have to bite the bullet and add boilerplate to every notebook. But you might at least be able to minimize the boilerplate. In that vein:
You could maintain your custom.js (probably under a more descriptive name) on github or whatever, and then add one line of boilerplate to all your notebooks to load that script from the URL. You would still need boilerplate, but it would be a lot shorter.
Once you have executed the code cell containing the javascript, it is saved in the notebook, which means that it will automatically happen the next time the browser loads it, even before the code cell is executed. So unless nbviewer prevents the javascript's execution, it should work just fine. This would also make things work nicely for collaborators, since they wouldn't have to download additional files.
As for your own style file, I suspect that anyone sophisticated enough to install ipython and latex, download your notebook, and run nbconvert on it would also be sophisticated enough to download the .sty file. Anyway, I don't see any way around the need to do that...

Verifying PEP8 in exported iPython notebook code

Is there a way to verify that an iPython notebook's code is PEP8 compliant, after it has been exported as an .ipynb file?
.ipynb files are pure json, you can read it, concatenate all the cells, and run pep8 on it. On the other end, getting the correct cell number/line number to "fix" them would be slightly more difficult.
I'm not aware of any project the does it right now.
I just modified the pep8.py file to extract out the python code from the json and check it for pep8 compatibility. The modified pep8.py file.
Use it without installing ( since it has not yet been reviewed ) :
python pep8.py notebook.ipynb --format="ipynb"
--format="ipynb" is used to get the line number offset on a per-code basis, instead of a cumulative numbering.
I've sent a Pull Request for the same on github.
Though I am not sure whether it would be merged, I feel you might find the same useful. Try it out !
EDIT: Looks like the PR won't get merged.

Getting more out of *.diff -files

I wonder if there are tools to show *.diff files used in patching related to debian packaging. What I need from the tool is that it could just read the diff file and show the actual files changed with changed rows, like kdiff or meld would do when comparing directly 2 different files. Or maybe I have totally wrong kind of approach to this, maybe I should ask how can I get more out of diff-files?
Kompare is able to open a .diff, and it shows you the files changed at the top, alist of changes of the selected file, and a side by side diff (for the lines that it is able to extract from the .diff.
However, when I feed it a debdiff, it got confused. The diff did not have === file headers, only --- and +++ headers, and so it included the changes from the /debian/changelog, /debian/copyright, and /debian/rules with in the /debian/control file. Ymmv.
Screenshot: http://imagebin.ca/view/fNWEzx.html
The Debian diff format seems to be a special diff format. As my short google search didn't result in a graphical tool, which could handle these files in the way normal diff tools do, I'm not sure, if such a tool exists. Perhaps you could try to convert these debiff files to normal diff files (I didn't find a tool, which would do that, either).
There is a tool to visualize changes in Linux packages (Deb, RPM, TAR.GZ, etc.) - pkgdiff.
Usage:
pkgdiff -old OLD.deb -new NEW.deb
Sample reports:
http://lvc.github.com/pkgdiff/pkgdiff_reports/libqb/0.4.1_to_0.8.1/changes_report.html
http://lvc.github.com/pkgdiff/pkgdiff_reports/gstreamer/0.10.23-i486-1_to_0.10.32-i486-1/changes_report.html