Adding csv tables to github using README.rst - github

So github claims to support reStructured text for the README.md file, and that's great, because I have a bill of materials in a .csv file I would like to add via the following to my README.rst:
.. csv-table::
:widths: 25 25 25 25
:file: bom.csv
However, when I view the page on my repository, all I see is the heading (as expected), and some text, but not the table. Further, I don't know where to find any output files from github's internal parsing engine to help me figure out what the problem is. This works just fine when building with sphinx on my local machine (the table is embedded as expected). What is the potential issue here, and how do I view github's internal outputs that would give me a clue as to what is going wrong?

CSV tables are a feature of the Sphinx-flavoured rST, not the plain Docutils rST. GitHub is using plain. You have to use either simple or grid tables on GH, see https://docutils.sourceforge.io/docs/user/rst/quickref.html#tables

Related

Dataprep import dataset does not detect headers in first row automatically

I am importing a dataset from Google Cloud Storage (parameterized) into Dataprep. So far, this worked perfectly fine and one of the feature that I liked is that it auto detects that the first row in my (application/octet-stream) .csv file are my headers.
However, today I tried to import a new dataset and it did not detect the headers, but it auto assigned column1, column2...
What has changed and or why is this the case. I have checked the box auto-detect and use UTF-8:
While the auto-detect option is usually pretty good, there are times that it fails for numerous reasons. I've specifically noticed this when the field names contain certain characters (e.g. comma, invisible characters like zero-width-non-joiners, null bytes), or when multiple different styles of newline delimiters are used within the same file.
Another case I saw this is when there were more columns of data than there were headers.
As you already hit on, you can use the following snippet to do mostly the same thing:
rename type: header method: filter sanitize: true
. . . or make separate recipe steps to convert the first row to header and then bulk-rename to your own liking.
More often than not, however, I've found that when auto-detect fails on a previously working file, it tends to be a sign of some sort of issue with the source file. I would look for mismatched data, as well as misplaced commas within the output, as well as comparing the header and some data rows to the original source using a plaintext editor.
When all else fails, you can try a CSV validator . . . but in my experience they tend to be incredibly opinionated when it comes to the formatting options of the file—so depending on the system generating the CSV, it could either miss any errors or give false-positives. I have had two experiences where auto-detect fails for no apparent reason on perfectly clean files, so it is possible that process was just skipped for some reason.
It should also be noted that if you have a structured file that was correctly detected but want to revert it, you can go to the dataset details, select the "..." (More) button, and choose "Remove structure..." (I'm hoping that one day they'll let you do the opposite when you want to add structure to a raw dataset or work around bugs like this!)
Best of luck!
Can be resolved as a transformation within a Flow:
rename type: header method: filter sanitize: true

Edmx update model add blank lines from autogeneration

I have an annoying problem and can't seem to figure out what's causing it. On my machine when I try to use Update Model from Database... on Edmx file in EF Database first approach the autogenerated model has blank lines between properties. This doesn't seem to occure on other developers machines even though we have same versions of VS , extensions etc.
Problem is that even when I add for example one new table the refresh automatically adds blank lines for all mapped tables. Later all of this is visible as conflict during merge operations in GIT.
Would really appreciate any help since I did't find a single shred of information on this issue anywhere and this really disrupts work.
I checked the files (Model.tt on my machine and my friends) using Notepad++ comparer and it said there are no differences but the encoding is different. When I copied Model.tt manually and did the update the blank lines were gone.... Must be some kind of quirk.
Posting as an answer since I wasted few hours on this and someone might have simmilar problem.
What worked for me
💡Turns out it was how my OS was ending lines
Working in Windows. Earlier disabled "auto carriage returns (CR) + line feeds (LF) line endings" in global Git configuration, reenabled:
git config --global core.autocrlf true
FYI 'nix/Mac ends lines w/ LFs only, Windows end lines w/ CRs + LFs
Opened up *DataModel.tt and *DataModel.Context.tt in Notepad++
Edit > EOL Conversions > Windows (CR LF) > Save
Refresh EDMX
Looking for a better terminal-based solution, sounds like dos2unix will come in to play at some point. Will amend this as soon as I've ironed this out.

Cond statement doxygen does not work

I am trying to separate out internal and external documentation using the doxygen constructs of cond; but i just cant seem to get get it working. I would essentially like to exclude some files completely and not conditionally. Regardless of where i add the tag (before include, before header guards etc) , the files and source both show up.
What i have tried in vain is to take the test file from doxegen repo for
conditional test and add it to the project.
Steps to reproduce [Linux]
create a new directory.
copy paste the above file (had to rename it to .h as .c was passed over?).
generate dummy config via doxygen -g.
update Doxyfile ENABLED_SECTION = COND_ENABLED.
Run doxygen.
check html/index.html
This however is still visible in the html documentation it generates for the project. I have set the ENABLED_SECTION variable with other values , but cond_enabled function still shows up. Running the testing directory of the project (doxygen) it passes. So i am lost.
Any suggestions?
Tried with latest version 1.8.14.
Thanks!
Regarding the \cond problems (not an answer directly to the real problem you face, I think, but to long for a comment).
The mentioned file is used in the, limited, testing doxygen can do / does and the first lines contain some instructions on what to do. Furthermore there is a default Doxyfile with the tests in use. It is hard to run a separate test outside the doxygen build tree.
Regarding the remark "Running the testing directory of the project (doxygen) it passes." This is correct, here, at the moment, only testing is done against the XML output and the generated output is compared to a once created version of the XML output. No tests are done, at the moment, in respect to HTML or PDF / LaTeX. Recently the test framework has been slightly extended so in the future this should be possible (compare the xhtml and tex output, but some work has still to be done here).
The version of the parser sees the \cond in the first line (normal C comment) as a doxygen command and skips everything till the first \endcond (your friend in these cases is always doxygen -d preprocessor). I think that removing / modifying the first line will result in an already better result. There is however another hiccup for e.g. HTML output. As the function cond_enabled is not documented and EXPAND_ALL is not set to YES the function will not appear in the documentation. So best is also to add a line of documentation with the function cond_enabled.
Regarding the seen HTML problems I modified the the relevant test in doxygen slightly and pushed a proposed patch to github (pull request 714, https://github.com/doxygen/doxygen/pull/714).
Note: the problem of skipping the \cond in normal C comment is quite a bit harder to implement (seen the logical complexity of the doxygen code in pre.l and commentcnv.l.
EDIT: 2018/06/10: The push request has been integrated in the master version on github.

Can I configure Jupyter Notebook to split source files and generated files?

I really like Jupyter Notebooks.
However, working with them is cumbersome in conjunction with a source control system like git, because an ipynb-File contains the source code (what you actually write in the notebook) and the generated output text / HTML / images / metadata / ...
For example, merge conflicts are difficult to resolve now, because everything is stored in one huge file with lots of generated data.
I wonder if I can configure Jupyter to store notebooks as
A source file: For example, I imagine this to be a Markdown file where everything surrounded by three backticks (```) is interpreted as a code cell. Diffs of that file would be meaningful and merge conflicts would be simple to resolve manually.
A generated file: This contains everything else. If there is a merge conflict within this file, it can be resolved by regenerating it.
Is this possible?
For reference: There is a slightly more general version of this question which lists various efforts at adapting IPython and Jupyter to this effect, and this answer proposes to solve the problem via Git. There is a Github project with a Git filter based on that answer, and (in its edit at the end) the answer links a few similar tools like nbstripout.

Mercurial: Pre and Post merge operations (per file)

We are using Mercurial as an SCM to handle the source script files of a program. Each project we manage has ~5000 files with each file containing a section with some product-specific informations about the file itself (version list, date, time etc.). This section is - due to the way it is structured - in 80% of the merges, the only section that has conflicts. They are easily resolved, but when merging around 300 files, it gets tiresome.
The problem is: I have no control over the way this section is written and I cannot change the format of the section itself, as it would make the file unusable by the program.
My question: is there a way in mercurial (hooks?), that allows me to
pre-process the file with a script
let mercurial do the merge
if merged correctly: post-process the file with a script. otherwise: "resolve-conflicts" as usual.
You could probably get away with it by creating a custom merge tool:
https://www.mercurial-scm.org/wiki/MergeToolConfiguration
A simple script that invokes 'diff' after removing the ever changing sections might be enough.
It sounds like those sections are the sort of nonsense that the (disrecommended) KeywordsExtension are built to handle, but I gather you don't have a lot of flexibility around them.