How does GitHub guess the encoding of a file? - github

How GitHub guesses encoding of a text file?
I have two text files in my repository: README.ru.koi8-r and mpman-ru.tex,
both use encoding koi8-r. GitHub uses right encoding for the first one and uses wrong for the second one.
Maybe there is a trick to force right guess?
Postscript. I solved the problem by adding a long comment at the top of the file in koi8-r, but there should be a better way to do it.

In the GitHub documentation it is stated that you in fact should "determine encoding for every single file". Furthermore they say that "(...) encoding could be set in .gitattributes file." which is probably what you are looking for right now as this will be pushable to GitHub (documentation for gitattributes files on git-scm).

You set the encoding for a repository with
git config gui.encoding koi8-r

Related

Github incorrectly detects Languages of my project as "Roff"

In one of my repositories nearly all of my code is Python and some HTML.
However, Github thinks otherwise:
What causes that?
You were creating files through a script, with an unintended extension. That is, your script was inserting a dot in the file name.
Simply rename your file my_file_0.5ms to my_file_05ms.txt and it will display the correct languages:
What you could do to fix similar problems in the future is use a script to detect extensions and the total lines of code for each extension.
Solution
GitHub Linguist is the culprit in this situation, but luckily, it can be easily resolved in a number of ways.
Create a .gitattributes file and list patterns that match the files you want to ignore, and then append either linguist-vendored or linguist-documentation.
specific-file.5ms
*.5ms
specific-folder/*
This will remove the files from your GitHub repositories statistics on the next run of Linguist (it may take some time).
Notes
If you'd like to attribute these files to a specific language, you can do that using linguist-language={name}. Full documentation on overriding Linguist can be found here.
You can also run Linguist on your own computer, but note that any changes to .gitattributes will not take effect until you commit to your repository. Linguist will not see changes that exist only in the index.

How to display images in Markdown files on Github?

I want to display some images in a Markdown file on Github. I found it works this way:
![Figure 1-1](https://raw.github.com/username/repo/master/images/figure 1-1.png "Figure 1-1")
But i need to collaborate with others so i don't want the username and repo name hard coded .
I tried to use this:
![Figure 1-1](images/figure 1-1.png "Figure 1-1")
It works on my local disk but not work on Github.
Is there anyone knows about this issue?
I found the answer myself.
Just simply append ?raw=true to the image url will make the trick:
![](images/table 1-1.png?raw=true)
I just had the same issue and it turned out to be caused by the space in the URL. Manually URL encoding the space as %20 fixed it.
So using your example I changed:
![](images/table 1-1.png)
to:
![](images/table%201-1.png)
2021 Edit: Thanks Emilio for pointing out that the GitHub flavored markdown spec has been updated to allow spaces in filenames when the filename is enclosed inside "pointy" (angle) brackets:
The destination can only contain spaces if it is enclosed in pointy brackets
Example 498
[link](</my uri>) --> <p>link</p>
Ref: https://github.github.com/gfm/#example-498 (scroll up for description)
This works with images too so we can now also use:
![](<images/table 1-1.png>)

Version control for DOCX and PDF?

I've been playing around with git and hg lately and then suddenly it occurred to me that this kind of thing will be great for documents.
I've a document which I edit in DOCX and export as PDF. I tried using both git and hg to version control it and turns out with hg you end up tracking only binary and diff-ing isn't meaningful. Although with git I can meaningfully diff DOCX (haven't tried on PDF yet) I was wondering if there is a better way to do it than I'm doing it right now. (Ideally, not having to leave Word to diff will be the best solution.)
There are two different concepts here - one is "can the version control system make some intelligent judgements about the contents of files?" - so that it can store just delta information between revisions (and do things like assign responsibility to individual parts of a file).
The other is 'do I have a file comparison tool which is useful for the types of files I have in the version control system'. Version control systems tend to come with file comparison tools which are inferior to dedicated alternatives. But they can pretty much always be linked to better diff programs - either for all file types or specific ones.
So it's common to use, for example, Beyond Compare as a general compare tool, with Word as a dedicated Word document comparer.
Different version control systems differ as to how good people perceive them to be at handling 'binaries', but that's often as much to do with handling huge files and providing exclusive locking as it is to do with file comparison.
http://tortoisehg.bitbucket.io/ includes a plugin called docdiff that integrates Word and Excel diff'ing.
You can use Beyond Compare as external diff tool for hg. Add to/change your user mercurial.ini as:
[extdiff]
cmd.vdiff = c:/path/to/BCompare.exe
Then get Beyond Compare file viewer rule for docx.
Now you should be able to compare two versions of docx in Beyond Compare.
This article outlines the solution for Docx using Pandoc
While this post outlines solution for PDF using pdf2html.
Only for docx, I compiled instructions for multiple places here: https://gist.github.com/nachocab/6429893
# download docx2txt by Sandeep Kumar
wget -O docx2txt.pl http://www.cs.indiana.edu/~kinzler/home/binp/docx2txt
# make a wrapper
echo '#!/bin/bash
docx2txt.pl $1 -' > docx2txt
chmod +x docx2txt
# make sure docx2txt.pl and docx2txt are your current PATH. Here's a guide
http://shapeshed.com/using_custom_shell_scripts_on_osx_or_linux/
mv docx2txt docx2txt.pl ~/bin/
# set .gitattributes (unfortunately I don't this can't be set by default, you have to create it for every project)
echo "*.docx diff=word" > .git/info/attributes
# add the following to ~/.gitconfig
[diff "word"]
binary = true
textconv = docx2txt
# add a new alias
[alias]
wdiff = diff --color-words
# try it
git init
# create my_file.docx, add some content
git add my_file.docx
git commit -m "Initial commit"
# change something in my_file.docx
git wdiff my_file.docx
# awesome!
It works great on OSX
If you happen to use a Mac, I wrote a git merge driver that can use Microsoft Word and tracked changes to merge and show conflicts between any file types Word can read & write.
http://github.com/jasmas/wordMerge
I say 'if you happen to use a Mac' because the driver I wrote uses AppleScript, primarily to accomplish this task.
It'd be nice to add a vbscript version to the project, but at the moment I don't have a Windows environment for testing. Anyone with some basic scripting knowledge should be able to take a look at what I'm doing and duplicate it in vbscript, powershell or whatever on Windows.
I used SVN (yes, in 2020 :-)) with TortoiseSVN on Windows. It has a built-in function to compare DOCX files (it opens Microsoft Word in a mode where your screen is divided into four parts: the file after the changes, before the changes, with changes highlighted and a list of changes). Screenshot below (sorry for the Polish version of MS Word). I also checked TortoiseGIT and it also has this functionality. I've read that TortoiseHG has it as well.

Perforce wildcard problem in branch specification

In a branch spec, I have the following view:
//depot/dev/t/a/g/... //depot/dev/t/r/g/...
-//depot/dev/t/a/g/p/o*/... //depot/dev/t/r/g/p/...
Perforce reports an "Incompatible wildcards" for the second rule there.
What I'd like to do is exclude all the directories beginning with "o".
What am I doing wrong, and how do I fix this?
I think you need to have matching wildcards on both sides of each mapping. Try:
//depot/dev/t/a/g/... //depot/dev/t/r/g/...
-//depot/dev/t/a/g/p/o*/... //depot/dev/t/r/g/p/o*/...
While not a direct answer to the question (answered above), I was stumped on the same message and found this post while trying to search for a solution.
In my case, it was because when copy-pasting the workspace mapping from another file, the ellipsis character was placed instead of the Perforce "..." wildcard. To fix this, I deleted the ellipsis and replaced it by typing in three periods.

CVS keyword substitution and Microsoft Word file

CVS has the keyword substitution feature: in a text file you write $Header$ and, when you commit the file, CVS substitutes $Header$ with something like $Header: /repo/src.cpp,v 1.6 2009/03/12 14:53:14 luser Exp $
Is it possible to get the same feature when dealing with a binary Microsoft Word file?
Thank you.
The basic problem you have with a Word file is that it is effectively a binary file (as opposed to a plain-text file), so you cannot be sure a key string like "$Header$" doesn't appear somewhere (VB macro code, for example) by accident. CVS would expand that key string, and suddenly something apparently unrelated (VB macro code, for example...) stops working.
Using CVS? Not likely. Even if $Header$ doesn't appear anywhere in your Word document (as DevSolar suggested it might), where do you place that string? Word stores text in its proprietary binary format, but CVS looks for plain text.
On the other hand, I'm sure you can achieve the effect by using either an XML Word format, or a Word macro.
Seems almost impossible with the traditional .doc format. Some creative work might allow you to create a process for making it happen with the newer XML format. I'm not sure CVS can do the job even then, but using a post-commit hook in subversion might make it more reasonable to pull off.