Github incorrectly detects Languages of my project as "Roff" - github

In one of my repositories nearly all of my code is Python and some HTML.
However, Github thinks otherwise:
What causes that?

You were creating files through a script, with an unintended extension. That is, your script was inserting a dot in the file name.
Simply rename your file my_file_0.5ms to my_file_05ms.txt and it will display the correct languages:
What you could do to fix similar problems in the future is use a script to detect extensions and the total lines of code for each extension.

Solution
GitHub Linguist is the culprit in this situation, but luckily, it can be easily resolved in a number of ways.
Create a .gitattributes file and list patterns that match the files you want to ignore, and then append either linguist-vendored or linguist-documentation.
specific-file.5ms
*.5ms
specific-folder/*
This will remove the files from your GitHub repositories statistics on the next run of Linguist (it may take some time).
Notes
If you'd like to attribute these files to a specific language, you can do that using linguist-language={name}. Full documentation on overriding Linguist can be found here.
You can also run Linguist on your own computer, but note that any changes to .gitattributes will not take effect until you commit to your repository. Linguist will not see changes that exist only in the index.

Related

Can I configure Jupyter Notebook to split source files and generated files?

I really like Jupyter Notebooks.
However, working with them is cumbersome in conjunction with a source control system like git, because an ipynb-File contains the source code (what you actually write in the notebook) and the generated output text / HTML / images / metadata / ...
For example, merge conflicts are difficult to resolve now, because everything is stored in one huge file with lots of generated data.
I wonder if I can configure Jupyter to store notebooks as
A source file: For example, I imagine this to be a Markdown file where everything surrounded by three backticks (```) is interpreted as a code cell. Diffs of that file would be meaningful and merge conflicts would be simple to resolve manually.
A generated file: This contains everything else. If there is a merge conflict within this file, it can be resolved by regenerating it.
Is this possible?
For reference: There is a slightly more general version of this question which lists various efforts at adapting IPython and Jupyter to this effect, and this answer proposes to solve the problem via Git. There is a Github project with a Git filter based on that answer, and (in its edit at the end) the answer links a few similar tools like nbstripout.

Which source control uses a "s." prefix on its filenames?

I found what appears to be an old source repository for some source code that I need to resurrect. But I have no idea what source control tools were used to generate and manage this source repository. In the directory, all of the files have a "s." prefixed to the file name. Without knowing the format in these files, I cannot manually extract the source code with any degree of accuracy. And even if I did, manually extracting the source code would be very time consuming and error prone.
What source/version control system prefixes its source files with "s." when it stores the source file in its repository directory?
How can I effectively extract the latest source code from this repository directory?
The s. prefix is characteristic of SCCS, the Source Code Control System. The code for that is probably still proprietary, but GNU has the CSSC project which can manipulate SCCS files. It tracks changes per-file in revisions, known as 'deltas'.
SCCS is the official revision control system for POSIX; you can find the commands documented on the Open Group site (but the file format is not specified there, AFAICT):
admin
delta
get
prs
rmdel
sact
unget
val
what
The file format is not specified by POSIX. The manual page for get says:
The SCCS files shall be files of an unspecified format.
The original SCCS command set included some extras not recorded by POSIX:
cdc — change delta commentary (for changing the checkin comments for a delta)
comb — combine, effectively for merging deltas
help — no prefix; the wasn't any other help program at the time. Commands generate error codes such as cm3 and help interpreted them.
sccsdiff — difference between two deltas of a file
Most systems now have a single command, sccs, which takes the operation name and then options. Often, the files were placed into an ./SCCS/ subdirectory and extracted from that as required, and the sccs front-end would handle name expansion, adding s. or SCCS/s. to the start of the file names.
For extracting the latest version of the source code, use get.
get s.*
sccs get s.*
These will get the default version of each file, and the default default is the latest version of the file.
If you need to make changes, use:
get -e s.filename.c
...make changes...
delta -y'Why you made the changes' s.filename.c
get s.filename.c
Note that the files 'lose' the s. prefix for the working file names, rather like RCS (Revision Control System) files lose the ,v suffix for the working file names. If you've not come across that, accept that it was different when SCCS and RCS were created, back in the late 70s or early 80s.
SCCS uses an s. prefix. But it might not be the only one!
I never knew this knowledge would come in useful some day!

.gitignore rules not working

I'm in the process of building a Lemonstand site, which I'm managing via git. As such, I'm using the following .gitignore rules:
# Lemonstand
.htaccess
php.ini
boot.php
index.php
install.php
/config/*
/controllers/*
/init/*
/logs/*
/phproad/*
/temp/*
/uploaded/*
/installer_files/*
/modules/backend/*
/modules/blog/*
/modules/cms/*
/modules/core/*
/modules/session/*
/modules/shop/*
/modules/system/*
/modules/users/*
# add content_*.php if you don't want erase client changes to content
/modules/gallery/*
/modules/lddevel/*
/modules/tweak/*
(The top block I got from github, with the second block being some additional rules I added myself to avoid issues with Lemonstands updating system).
My problem is that I'm adding a custom invoice to Lemonstand, which (to cut a long story short) requires the addition of a folder and some files within /modules/shop/invoice_templates/, which I've named cm_standard.
Because this is extra to the default Lemonstand, I want to get this tracked with git, so I'm trying to use the following rule to the bottom of my gitignore file:
!/modules/shop/invoice_templates/cm_standard/*
But when I do a git add -A, it isn't picking up the files within that directory. Even if I do a git add modules/shop/invoice_templates/cm_standard/* it tells me:
The following paths are ignored by one of your .gitignore files:
modules/shop/invoice_templates
Use -f if you really want to add them.
fatal: no files added
Which further suggests I've not written the rule correctly - can anyone help? Thank you.
Ignore patterns with fewer path segments can take precedence over patterns with more path segments, so in your case /modules/shop/* is taking precedence over !/modules/shop/invoice_templates/cm_standard/*, effectively pruning the whole of /modules/shop/invoice_templates/ from the directory traversal even before it looks at the contents of !/modules/shop/invoice_templates/cm_standard. Having said that, ordering can matter too, and when it does, somewhat counter-intuitively later rules within a file take precedence over earlier ones.
This question is very similar to How do gitignore exclusion rules actually work? so I suggest you read that. Also you may find the check-ignore subcommand useful when debugging rules - I added it to git over the last few months and it just appeared in version 1.8.2 which was released a few days ago.

How to join two files in a version control system

I am doing a refactoring of my C++ project containing many source files.
The current refactoring step includes joining two files (say, x.cpp and y.cpp) into a bigger one (say, xy.cpp) with some code being thrown out, and some more code added to it.
I would like to tell my version control system (Perforce, in my case) that the resulting file is based on two previous files, so in future, when i look at the revision history of xy.cpp, i also see all the changes ever done to x.cpp and y.cpp.
Perforce supports renaming files, so if y.cpp didn't exist i would know exactly what to do. Perforce also supports merging, so if i had 2 different versions of xy.cpp it could create one version from it. From this, i figure out that joining two different files is possible (not sure about it); however, i searched through some documentation on Perforce and other source control systems and didn't find anything useful.
Is what i am trying to do possible at all?
Does it have a conventional name (searching the documentation on "merging" or "joining" was unsuccessful)?
You could try integrating with baseless merges (-i on the command line). If I understand the documentation correctly (and I've never used it myself), this will force the integration of two files. You would then need to resolve the integration however you choose, resulting in something close to the file you are envisioning.
After doing this, I assume the Perforce history would show the integration from the unrelated file in it's integration history, allowing you to track back to that file when desired.
I don't think it can be done in a classic VCS.
Those versioning systems come in two flavors (slide 50+ of Getting git by Scott Chacon):
delta-based history: you take one file, and record its delta. In this case, the unit being the file, you cannot associate its history with another file.
DAG-based history: you take one content and record its patches. In this case, the file itself can vary (it can be renamed/moved at will), and it can be the result of two other contents (so it is close of what you want)... but still within the history of one file (the contents coming from different branches of its DAG).
The easy part would be this:
p4 edit x.cpp y.cpp
p4 move x.cpp xy.cpp
p4 move y.cpp xy.cpp
Then the tricky part becomes resolving the move of y.cpp and doing your refactoring. But this will tell Perforce that the files are combined.

Can you override Mercurial's use of ".hg" and ".hgignore"?

We're having a compatibility problem between Mercurial and another synchronization product (MS Office Groove, now Sharepoint Workspace 2010). Basically, the ".hg" folder and ".hgignore" files are being blocked (I've summarized the issue with the other software).
We've been told that the only work-around is to change the name of the folder. Is there any way to modify Mercurial's naming convention for the database folder and ignore file? From what I understand, I'd just need to make the names more "normal" by adding a prefix. Sort of a "Hail Mary" but thought I'd ask.
You can use a different file name for .hgignore (or more accurately have no .hgignore and then use an ignore= line in your hg/hgrc file to reference an additional ignore file with whatever name you want). However you can't use an alternative name for .hg itself.
As Lasse suggests you shouldn't be synchronizing .hg directories externally anyway. Their read/write/lock semantics are very specific and the only reading/writing to them should be via hg itself. It's very easy to set up a cron job that automatically pushes to a backup repo as often as you'd like.
The names are hard-coded in the Mercurial source, search for occurrences of ".hg" and ".hgignore" (for added safety, also using single quotes). However, since a typical Mercurial installation provides the Mercurial source code, it's easy to change this to anything you prefer.