Does GitHub Linguist support prefix wildcards for linguist-vendored? - github

In GitHub's documentation on linguist, the section on using the .gitattributes file says a path can be marked as vendored, and thus ignored in the repository's statistics tracking, with:
special-vendored-path/* linguist-vendored
However, is it possible to have linguist mark directories as vendored that may be nested in directories containing non-vendored code?
I tried adding a line styled as */special-vendored-path/* linguist-vendored to my .gitattributes, but that didn't cause the GitHub code-proportion information to change.

To match a directory inside an arbitrary arborescence of directories, you need double asterisks:
**/special-vendored-path/* linguist-vendored
Note, however, that double asterisks are not needed at the end of paths. For example, test1/* will match test1/test2/test3/file.

Related

Github incorrectly detects Languages of my project as "Roff"

In one of my repositories nearly all of my code is Python and some HTML.
However, Github thinks otherwise:
What causes that?
You were creating files through a script, with an unintended extension. That is, your script was inserting a dot in the file name.
Simply rename your file my_file_0.5ms to my_file_05ms.txt and it will display the correct languages:
What you could do to fix similar problems in the future is use a script to detect extensions and the total lines of code for each extension.
Solution
GitHub Linguist is the culprit in this situation, but luckily, it can be easily resolved in a number of ways.
Create a .gitattributes file and list patterns that match the files you want to ignore, and then append either linguist-vendored or linguist-documentation.
specific-file.5ms
*.5ms
specific-folder/*
This will remove the files from your GitHub repositories statistics on the next run of Linguist (it may take some time).
Notes
If you'd like to attribute these files to a specific language, you can do that using linguist-language={name}. Full documentation on overriding Linguist can be found here.
You can also run Linguist on your own computer, but note that any changes to .gitattributes will not take effect until you commit to your repository. Linguist will not see changes that exist only in the index.

VS Code: Search multiple directories

When searching, VS Code has the ability to list files to include to scope the search. This is used by default when using the "find in folder" feature. For example, searching src results in ./src as the files to include.
Is there a syntax I can use to list multiple directories here? For example, I want to search ./src and ./lib in one search.
Did you try a comma like ./dir1, ./dir2? For me it seems to work
By the way, here is the documentation of 'files to include': https://code.visualstudio.com/Docs/editor/codebasics#_advanced-search-options
In particular, you can use glob notation. Also, VS Code will include/exclude certain directories or files by default, depending on your settings.json, in case anyone still sees unexpected behaviour.

Using gitattributes for linguist examples

Are there any concrete examples, in order to detect wrong languages in GitHub via Linguist attributes?
Source: https://github.com/github/linguist
linguist-documentation
linguist-language
linguist-vendored
Examples can be found in Linguist's documentation. What you want can be achieved with linguist-language attributes.
linguist-language
With the following attribute, Linguist detects all .rb files as being Java files.
*.rb linguist-language=Java
linguist-vendored
With the following attribute, Linguist detects files in the special-vendored-path directory (notice the mandatory trailing *) as vendored and excludes them from statistics.
special-vendored-path/* linguist-vendored
linguist-documentation
Without the following attribute, Linguist would detect the file docs/formatter.rb as documentation and exclude it from statistics.
docs/formatter.rb linguist-documentation=false
linguist-detectable
With the following attribute, Linguist counts SQL files in statistics. Without this attribute, only programming and markup languages are counted in statistics.
*.sql linguist-detectable=true

.gitignore rules not working

I'm in the process of building a Lemonstand site, which I'm managing via git. As such, I'm using the following .gitignore rules:
# Lemonstand
.htaccess
php.ini
boot.php
index.php
install.php
/config/*
/controllers/*
/init/*
/logs/*
/phproad/*
/temp/*
/uploaded/*
/installer_files/*
/modules/backend/*
/modules/blog/*
/modules/cms/*
/modules/core/*
/modules/session/*
/modules/shop/*
/modules/system/*
/modules/users/*
# add content_*.php if you don't want erase client changes to content
/modules/gallery/*
/modules/lddevel/*
/modules/tweak/*
(The top block I got from github, with the second block being some additional rules I added myself to avoid issues with Lemonstands updating system).
My problem is that I'm adding a custom invoice to Lemonstand, which (to cut a long story short) requires the addition of a folder and some files within /modules/shop/invoice_templates/, which I've named cm_standard.
Because this is extra to the default Lemonstand, I want to get this tracked with git, so I'm trying to use the following rule to the bottom of my gitignore file:
!/modules/shop/invoice_templates/cm_standard/*
But when I do a git add -A, it isn't picking up the files within that directory. Even if I do a git add modules/shop/invoice_templates/cm_standard/* it tells me:
The following paths are ignored by one of your .gitignore files:
modules/shop/invoice_templates
Use -f if you really want to add them.
fatal: no files added
Which further suggests I've not written the rule correctly - can anyone help? Thank you.
Ignore patterns with fewer path segments can take precedence over patterns with more path segments, so in your case /modules/shop/* is taking precedence over !/modules/shop/invoice_templates/cm_standard/*, effectively pruning the whole of /modules/shop/invoice_templates/ from the directory traversal even before it looks at the contents of !/modules/shop/invoice_templates/cm_standard. Having said that, ordering can matter too, and when it does, somewhat counter-intuitively later rules within a file take precedence over earlier ones.
This question is very similar to How do gitignore exclusion rules actually work? so I suggest you read that. Also you may find the check-ignore subcommand useful when debugging rules - I added it to git over the last few months and it just appeared in version 1.8.2 which was released a few days ago.

Difference in the paths in .gitignore file?

I've been using git but still having confusion about the .gitignore file paths.
So, what is the difference between the following two paths in .gitignore file?
tmp/*
public/documents/**/*
I can understand that tmp/* will ignore all the files and folders inside it. Am I right?
But what does that second line path mean?
This depends on the behavior of your shell. Git doesn't do any work to determine how to expand these. In general, * matches any single file or folder:
/a/*/z
matches /a/b/z
matches /a/c/z
doesn't match /a/b/c/z
** matches any string of folders:
/a/**/z
matches /a/b/z
matches /a/b/c/z
matches /a/b/c/d/e/f/g/h/i/z
doesn't match /a/b/c/z/d.pr0n
Combine ** with * to match files in an entire folder tree:
/a/**/z/*.pr0n
matches /a/b/c/z/d.pr0n
matches /a/b/z/foo.pr0n
doesn't match /a/b/z/bar.txt
Update (08-Mar-2016)
Today, I am unable to find a machine where ** does not work as claimed. That includes OSX-10.11.3 (El Capitan) and Ubuntu-14.04.1 (Trusty). Possibly git-ignore as been updated, or possibly recent fnmatch handles ** as people expect. So the accepted answer now seems to be correct in practice.
Original post
The ** has no special meaning in git. It is a feature of bash >= 4.0, via
shopt -s globstar
But git does not use bash. To see what git actually does, you can experiment with git add -nv and files in several levels of sub-directories.
For the OP, I've tried every combination I can think of for the .gitignore file, and nothing works any better than this:
public/documents/
The following does not do what everyone seems to think:
public/documents/**/*.obj
I cannot get that to work no matter what I try, but at least that is consistent with the git docs. I suspect that when people add that to .gitignore, it works by accident, only because their .obj files are precisely one sub-directory deep. They probably copied the double-asterisk from a bash script. But perhaps there are systems where fnmatch(3) can handle the double-asterisk as bash can.
If you're using a shell such as Bash 4, then ** is essentially a recursive version of *, which will match any number of subdirectories.
This makes more sense if you add a file extension to your examples. To match log files immediately inside tmp, you would type:
/tmp/*.log
To match log files anywhere in any subdirectory of tmp, you would type:
/tmp/**/*.log
But testing with git version 1.6.0.4 and bash version 3.2.17(1)-release, it appears that git does not support ** globs at all. The most recent man page for gitignore doesn't mention **, either, so this is either (1) very new, (2) unsupported, or (3) somehow dependent on your system's implementation of globbing.
Also, there's something subtle going on in your examples. This expression:
tmp/*
...actually means "ignore any file inside a tmp directory, anywhere in the source tree, but don't ignore the tmp directories themselves". Under normal circumstances, you'd probably just write:
/tmp
...which would ignore a single top-level tmp directory. If you do need to keep the tmp directories around, while ignoring their contents, you should place an empty .gitignore file in each tmp directory to make sure that git actually creates the directory.
Note that the '**', when combined with a sub-directory (**/bar), must have changed from its default behavior, since the release note for git1.8.2 now mentions:
The patterns in .gitignore and .gitattributes files can have **/, as a pattern that matches 0 or more levels of subdirectory.
E.g. "foo/**/bar" matches "bar" in "foo" itself or in a subdirectory of "foo".
See commit 4c251e5cb5c245ee3bb98c7cedbe944df93e45f4:
"foo/**/bar" matches "foo/x/bar", "foo/x/y/bar"... but not "foo/bar".
We make a special case, when foo/**/ is detected (and "foo/" part is already matched), try matching "bar" with the rest of the string.
"Match one or more directories" semantics can be easily achieved using "foo/*/**/bar".
This also makes "**/foo" match "foo" in addition to "x/foo", "x/y/foo"..
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds#gmail.com>
Simon Buchan also commented:
current docs (.gitignore man page) are pretty clear that no subdirectory is needed, x/** matches all files under (possibly empty) x
The .gitignore man page does mention:
A trailing "/**" matches everything inside. For example, "abc/**" matches all files inside directory "abc", relative to the location of the .gitignore file, with infinite depth.
A slash followed by two consecutive asterisks then a slash matches zero or more directories. For example, "a/**/b" matches "a/b", "a/x/b", "a/x/y/b" and so on.
When ** isn't supported, the "/" is essentially a terminating character for the wildcard, so when you have something like:
public/documents/**/*
it is essentially looking for two wildcard items in between the slashes and does not pick up the slashes themselves. Consequently, this would be the same as:
public/documents/*/*
It doesn't work for me but you could create a new .gitignore in that subdirectory:
tmp/**/*.log
can be replaced by a .gitignore in tmp:
*.log