Rake FileList exclude method works in irb but not my rakefile - rake

Goal : Collect all files, complete with directory structure, matching a directory structure.
Wrinkle: Need to filter out a pesky undesired directory that matches but is thankfully uniquely named 'do-not-want'. Actual string changed to protect the innocent.
source/dir1/content/scripts - ok
source/dir2/subdir1/content/scripts - ok
source/dir3/do-not-want/content/scripts - well... do not want
The script below works but I have to do a separate check for the undesired path which should not be necessary. When I test this same FileList in irb with the exclude it works as desired. From my rakefile I see the do-not-want directories being returned by the FileList.
FileList['source/**/content/scripts'].exclude('do-not-want').each do |f|
unless /do-not-want/ =~ f #hmm why does the exclude above not actually exclude do-not-want directories?
Dir.chdir(f) do |d|
puts "directory changed to #{d} and copying scripts from #{d} to common directory #{target}"
FileUtils.cp_r('.', target)
end
end
end
Surely I am doing something dumb.
Bonus points: if you help me learn rake/ruby and show me a better way to accomplish same goal while defeating the wrinkle.

I think you should use FileList['source/**/content/scripts'].exclude(/\/do-no-want\//) to filter out paths which contain "/do-no-want/" substring.
Try adding output to your rake file, so you can see and debug what's going on there.

Related

Duplicate Outputs in Doxygen

I'm generating developer documentation using Doxygen. It's parsing all of the files correctly, but the output is generating duplicate entries in the member function list and class diagram.
Any ideas?
I had this exact problem, and found that I had accidentally specified a build folder in the INPUT line due to RECURSIVE being on, e.g.,
Example file structure:
./
MyLibrarySources/
Libs/
build/
Doxyfile:
INPUT = ./ MyLibrarySources/ ...
RECURSIVE = YES
This caused Doxygen to parse the headers from two different locations: once from MyLibrarySources/, and once from build/, producing duplicate members and other odd results.
The easy solution is to add your build directory to the EXCLUDE line, e.g.:
EXCLUDE = "build"
This makes Doxygen not parse the same header files in two different locations. And yes, in-source build directories are usually a bad idea, place them elsewhere. In my case, command-line builds not issued from my IDE went there by default.
Edit note: I had incorrectly believed that the source files were being parsed twice because of the double-specification in the INPUT line. This is not the case. Doxygen is smart about this and will not parse the same physical file twice đź‘Ť.

Handle parallel build error correctly in emacs compilation mode

When I'm using M-x compile to do parallel compilation with make -jn for a multi (level) dir project, when I got an error, I can't get to the correct place with next-error stuff. Emacs always goes to wrong directory for the problematic file. But I have no problem if I just do it without -jn.
next-error uses the text output of your compilation to determine where to go. But with parallel compilation, this text output can be corrupted, and even if it is not corrupted it can and often is ambiguous (think of one task compiling foo/bar and the other task compiling toto/titi, and the output looking like "entering directory foo; entering directory toto; error in bar:20; error in titi:69").
I can only think of the following ways to solve this problem:
structure your make files so that you never change directory (so all the file names are relative to the same current working directory).
change your make files so as to pass absolute file names to your compiler, so all the file names in error messages are absolute.
hack Emacs's compile.el so that when looking for "bar", it fetches it in all the directories that have been mentioned before.
This last change would probably be a good one (i.e. patch welcome), but note that it would still bump into problems if "bar" exists in both "foo" and "toto".
The other two changes can also still bump into problems because the output can also end up looking like "Entering directory foEntering directory toto; o;"; and I don't know what can be done to avoid this problem.

rename files& directories {searchstr} with {replacestr}

I have an application (Templify) that creates a templatized directory structure, but it seems to not be able to rename the "__NAME__" with what I've identified as the target.
This is fine if I can find a clean way to rename all files & directories with my replacement text.
I found a rename.pl method that renames files, and I found some code that removes underscores in file names and replaces it with spaces... but when I modify the code to put in my search terms, it never seems to work.
So, basically, I need to replace "__NAME__" with something like "Project-Name".
I'm happy to modify the search strings for each future reuse, but I'd love to figure out how to create a file to which I can pass ARGS.
I'm on XP and can use cygwin (cygwin doesn't seem to have 'rename' which makes it hard to locate linux-type solutions with using the function called 'rename'....)
I did find this which is easy to use for files in the current directory, but I don't know enough to tell it to recurse into sub-directories.
Any help would be great.
Thanks,
Scott
From cygwin:
find /cygdrive/c/mytree -type f | perl -ne 'rename $_, $1/Project-Name if m[^(.*)/__NAME__$]'
Or using python:
import os
for root, dirs, files in os.walk("C:\\mytree"):
for filename in files:
if filename == "__NAME__":
os.rename(os.path.join(root, filename), os.path.join(root, "Project-Name"))

Difference in the paths in .gitignore file?

I've been using git but still having confusion about the .gitignore file paths.
So, what is the difference between the following two paths in .gitignore file?
tmp/*
public/documents/**/*
I can understand that tmp/* will ignore all the files and folders inside it. Am I right?
But what does that second line path mean?
This depends on the behavior of your shell. Git doesn't do any work to determine how to expand these. In general, * matches any single file or folder:
/a/*/z
matches /a/b/z
matches /a/c/z
doesn't match /a/b/c/z
** matches any string of folders:
/a/**/z
matches /a/b/z
matches /a/b/c/z
matches /a/b/c/d/e/f/g/h/i/z
doesn't match /a/b/c/z/d.pr0n
Combine ** with * to match files in an entire folder tree:
/a/**/z/*.pr0n
matches /a/b/c/z/d.pr0n
matches /a/b/z/foo.pr0n
doesn't match /a/b/z/bar.txt
Update (08-Mar-2016)
Today, I am unable to find a machine where ** does not work as claimed. That includes OSX-10.11.3 (El Capitan) and Ubuntu-14.04.1 (Trusty). Possibly git-ignore as been updated, or possibly recent fnmatch handles ** as people expect. So the accepted answer now seems to be correct in practice.
Original post
The ** has no special meaning in git. It is a feature of bash >= 4.0, via
shopt -s globstar
But git does not use bash. To see what git actually does, you can experiment with git add -nv and files in several levels of sub-directories.
For the OP, I've tried every combination I can think of for the .gitignore file, and nothing works any better than this:
public/documents/
The following does not do what everyone seems to think:
public/documents/**/*.obj
I cannot get that to work no matter what I try, but at least that is consistent with the git docs. I suspect that when people add that to .gitignore, it works by accident, only because their .obj files are precisely one sub-directory deep. They probably copied the double-asterisk from a bash script. But perhaps there are systems where fnmatch(3) can handle the double-asterisk as bash can.
If you're using a shell such as Bash 4, then ** is essentially a recursive version of *, which will match any number of subdirectories.
This makes more sense if you add a file extension to your examples. To match log files immediately inside tmp, you would type:
/tmp/*.log
To match log files anywhere in any subdirectory of tmp, you would type:
/tmp/**/*.log
But testing with git version 1.6.0.4 and bash version 3.2.17(1)-release, it appears that git does not support ** globs at all. The most recent man page for gitignore doesn't mention **, either, so this is either (1) very new, (2) unsupported, or (3) somehow dependent on your system's implementation of globbing.
Also, there's something subtle going on in your examples. This expression:
tmp/*
...actually means "ignore any file inside a tmp directory, anywhere in the source tree, but don't ignore the tmp directories themselves". Under normal circumstances, you'd probably just write:
/tmp
...which would ignore a single top-level tmp directory. If you do need to keep the tmp directories around, while ignoring their contents, you should place an empty .gitignore file in each tmp directory to make sure that git actually creates the directory.
Note that the '**', when combined with a sub-directory (**/bar), must have changed from its default behavior, since the release note for git1.8.2 now mentions:
The patterns in .gitignore and .gitattributes files can have **/, as a pattern that matches 0 or more levels of subdirectory.
E.g. "foo/**/bar" matches "bar" in "foo" itself or in a subdirectory of "foo".
See commit 4c251e5cb5c245ee3bb98c7cedbe944df93e45f4:
"foo/**/bar" matches "foo/x/bar", "foo/x/y/bar"... but not "foo/bar".
We make a special case, when foo/**/ is detected (and "foo/" part is already matched), try matching "bar" with the rest of the string.
"Match one or more directories" semantics can be easily achieved using "foo/*/**/bar".
This also makes "**/foo" match "foo" in addition to "x/foo", "x/y/foo"..
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds#gmail.com>
Simon Buchan also commented:
current docs (.gitignore man page) are pretty clear that no subdirectory is needed, x/** matches all files under (possibly empty) x
The .gitignore man page does mention:
A trailing "/**" matches everything inside. For example, "abc/**" matches all files inside directory "abc", relative to the location of the .gitignore file, with infinite depth.
A slash followed by two consecutive asterisks then a slash matches zero or more directories. For example, "a/**/b" matches "a/b", "a/x/b", "a/x/y/b" and so on.
When ** isn't supported, the "/" is essentially a terminating character for the wildcard, so when you have something like:
public/documents/**/*
it is essentially looking for two wildcard items in between the slashes and does not pick up the slashes themselves. Consequently, this would be the same as:
public/documents/*/*
It doesn't work for me but you could create a new .gitignore in that subdirectory:
tmp/**/*.log
can be replaced by a .gitignore in tmp:
*.log

Why do directory listings contain the current (.) and parent (..) directory?

Whenever I list the contents of a directory with a function like readdir, the returned file names also include "." and "..". I have the suspicion that these are just normal links in the file system and therefore indistinguishable from actual files, but I always have to filter them out because they are not actual objects in the directory I am listing. Is there a good reason for functions like readdir to include them? Do some operating systems or file systems contain more or different virtual file names? Is there a better way to filter them out other than by doing string comparison with "." and ".."?
Update: thank you all for answering. I suppose I always thought that things like ./ and ../ were mere conventions that could be handled by searching and replacing. I find it a bit surprising, though probably more efficient and transparent, to have them be part of the file system itself.
One question remains, though: since . and .. are arbitrary names for these links, are there file systems that use different ones?
. and .. are actually hard links in filesystems. They are needed so that you can specify relative paths, based on some reference path (consider "../sibling/file.txt"). Since these hard links are actually existing in the filesystem, it makes sense for readdir to tell you about them. (actually the term hard link just means some name that is indistinguishable from the actual directory referred to: they both point to the same inode in the filesystem).
Best way is to just strcmp and ignore them, if you don't want to list them.
Originally they were hard links, and the number of special cases in the filesystem code for . and .. were minimal. That's not true for all modern filesystems, however.
But the conventions have been established so that even filesystems where these two directory entries don't actually exist still report their existence through APIs like readdir. Changing this would now would break a lot of code.
I have the suspicion that these are
just normal links in the file system
and therefore indistinguishable from
actual files
They are. While you may perceive the file system as a hierarchy of "folders" "containing" folders, it is actually a doubly linked tree1, with directories being nodes and files being leafs. So, . and .. are needed links for accessing the leaves of the current node and for traversing the tree, and they are the same thing as all the other links.
When you call readdir, you get all the places you can directly go to from the current node. If you do not want to list places that you perceive as "up", you have to sort them out yourself. You should write a little function for that, perhaps called readdir_down. I do not know in which order readdir lists the directories, but perhaps you can just throw away the first two entries.
1) this is a first approximation, there are also "hard links" possible that make the tree actually a net.
One reason is that without them there is no way to get to the parent directory. Or get a handle to the current directory.
Without them, we cannot do such things as:
./run_this
Indeed, we couldn't add '.' to the $PATH, meaning we couldn't ever execute files that weren't already in the path.
These are normal directories, they are "hard links" to the current directory and directory above. They are present in all directories (even at the root level, where .. is exactly the same as .).
When using ls, you can filter out . and .. with ls -A (note the capital -A).
When applying a command to all dot-files, but not . or .., I often use .??* which matches only dot-file with a name of three characters or more.
touch .??*
Note this pattern also excludes any other file that begins with dot and is only two characters long (e.g. .x) but those files are uncommon.
When using programmatic file-listers like readdir() I do have to exclude . and .. manually. Since these two files are supposed to be first in the list returned by readdir() you can do this:
#files = readdir(DIR);
for (1..2) { shift #files; } # get rid of . and ..
# go on with your business
They are reported because they are stored in the directory listing. That's the way unices have always worked.
Because on Unix-like operating systems, the directory-listing commands include those, and you use them to move up and down in the filesystem hierarchy.
Something like grep { not /^.{1,2}\z/ } readdir HANDLE should work for you.
there is no good reason a directory scan should return these filenames.