Why do directory listings contain the current (.) and parent (..) directory? - operating-system

Whenever I list the contents of a directory with a function like readdir, the returned file names also include "." and "..". I have the suspicion that these are just normal links in the file system and therefore indistinguishable from actual files, but I always have to filter them out because they are not actual objects in the directory I am listing. Is there a good reason for functions like readdir to include them? Do some operating systems or file systems contain more or different virtual file names? Is there a better way to filter them out other than by doing string comparison with "." and ".."?
Update: thank you all for answering. I suppose I always thought that things like ./ and ../ were mere conventions that could be handled by searching and replacing. I find it a bit surprising, though probably more efficient and transparent, to have them be part of the file system itself.
One question remains, though: since . and .. are arbitrary names for these links, are there file systems that use different ones?

. and .. are actually hard links in filesystems. They are needed so that you can specify relative paths, based on some reference path (consider "../sibling/file.txt"). Since these hard links are actually existing in the filesystem, it makes sense for readdir to tell you about them. (actually the term hard link just means some name that is indistinguishable from the actual directory referred to: they both point to the same inode in the filesystem).
Best way is to just strcmp and ignore them, if you don't want to list them.

Originally they were hard links, and the number of special cases in the filesystem code for . and .. were minimal. That's not true for all modern filesystems, however.
But the conventions have been established so that even filesystems where these two directory entries don't actually exist still report their existence through APIs like readdir. Changing this would now would break a lot of code.

I have the suspicion that these are
just normal links in the file system
and therefore indistinguishable from
actual files
They are. While you may perceive the file system as a hierarchy of "folders" "containing" folders, it is actually a doubly linked tree1, with directories being nodes and files being leafs. So, . and .. are needed links for accessing the leaves of the current node and for traversing the tree, and they are the same thing as all the other links.
When you call readdir, you get all the places you can directly go to from the current node. If you do not want to list places that you perceive as "up", you have to sort them out yourself. You should write a little function for that, perhaps called readdir_down. I do not know in which order readdir lists the directories, but perhaps you can just throw away the first two entries.
1) this is a first approximation, there are also "hard links" possible that make the tree actually a net.

One reason is that without them there is no way to get to the parent directory. Or get a handle to the current directory.
Without them, we cannot do such things as:
./run_this
Indeed, we couldn't add '.' to the $PATH, meaning we couldn't ever execute files that weren't already in the path.

These are normal directories, they are "hard links" to the current directory and directory above. They are present in all directories (even at the root level, where .. is exactly the same as .).
When using ls, you can filter out . and .. with ls -A (note the capital -A).
When applying a command to all dot-files, but not . or .., I often use .??* which matches only dot-file with a name of three characters or more.
touch .??*
Note this pattern also excludes any other file that begins with dot and is only two characters long (e.g. .x) but those files are uncommon.
When using programmatic file-listers like readdir() I do have to exclude . and .. manually. Since these two files are supposed to be first in the list returned by readdir() you can do this:
#files = readdir(DIR);
for (1..2) { shift #files; } # get rid of . and ..
# go on with your business

They are reported because they are stored in the directory listing. That's the way unices have always worked.

Because on Unix-like operating systems, the directory-listing commands include those, and you use them to move up and down in the filesystem hierarchy.
Something like grep { not /^.{1,2}\z/ } readdir HANDLE should work for you.

there is no good reason a directory scan should return these filenames.

Related

Why does Matlab find '.' and '..' dirs when reading folder content using dir('')? [duplicate]

The ls -ai command shows that . and .. have their inodes the same as the current directory and parent directory, respectively.
What exactly are . and ..?
Are they real files or even hard links? But as I have known, it's not allowed to create a hard link to a directory.
. represents the directory you are in and .. represents the parent directory.
From the dot definition:
This is a short string (i.e., sequence of characters) that is added to
the end of the base name (i.e., the main part of the name) of a file
or directory in order to indicate the type of file or directory.
On Unix-like operating systems every directory contains, as a minimum,
an object represented by a single dot and another represented by two
successive dots. The former refers to the directory itself and the
latter refers to its parent directory (i.e., the directory that
contains it). These items are automatically created in every
directory, as can be seen by using the ls command with its -a option
(which instructs it to show all of its contents, including hidden
items).
They are special name-inode maps which do count as hard-links (they do increase the link-count) though they aren't really hard-links, since, as you said, directories can't have hard-links. Read more here: Hard links and Unix file system nodes (inodes)
. represents the current directory that you are using and
.. represents the parent directory.
Example:
Suppose you are in the directory /etc/mysql and you wanted to move to the parent directory, i.e. /etc/. Then use cd..:
/etc/mysql> cd ..
And if you wanted to set the path of one file in the current directory bash file, use . with file name like this: ./filename
They are not hard links. You can more think of it like a shorthand for this directory (.) and parent of this directory (..).
Try to remove or rename . or ... Then you understand why it is not a hard link.

Documentation for nautilus and GIO's .hidden file feature?

I just discovered some mentions of how nautilus used to read files named .hidden and hide files matching the patterns listed in them, and at some point that feature was moved to GIO g_file_info_get_is_hidden. However, I haven't been able to get it to work. If I put the exact name of a file into .hidden, it does get hidden, but I'd really like to be able to use a pattern. I can't find any solid or recent documentation about how this feature is supposed to work.
I'd particularly like to hide files matching hg-checkexec-*. Mercurial running under Emacs periodically creates bunches of these temporary files and they gum up my nautilus view.
Is this feature documented anywhere? How is it supposed to work?
Looking at the code, .hidden files as implemented in GIO support one filename per line, with no support for patterns. A .hidden file cannot list files in subdirectories — only those in the same directory.
I don’t know of any documentation about the feature. Please file a bug about adding it.
As a complement to Philip Withnall's answer, I've dived further into the source code, specifically the functions read_hidden_file() and file_is_hidden():
read_hidden_file() basically parses the .hidden in a directory and stores each line in as a key in a GLib HashTable object.
The object is created using g_hash_table_new_full() with parameters g_str_hash, g_str_equal, g_free, NULL. This mean keys are plain strings with comparison being a plain (case-sensitive) string equality, so no globs, regex or any patterns support.
It is populated using g_hash_table_add(), so not used as a key/value pair table but rather as a plain set, with keys being the elements themselves.
file_is_hidden() is called for each content (file or sub-directory) in a given directory. It uses g_hash_table_contains() to check if the file's basename is a key in above object, so no pattern search whatsoever.
So, as Phillip concluded, it seems there is indeed no support for any kind of globs, regexes or pattern seach in .hidden files. I would also die for a .gitignore-like syntax.

Is there a Perl module that collapses file system paths such as a/b/.. or a//b?

I'm writing a program where I have to remove redundancy in paths, e.g.
a/b/.. -> a
a//b -> a/b
a/./b -> a/b
Does any existing module do this?
Update: This normalization/canonicalization is described by RFC 3986. I only need the path segment normalization part.
Of course, this is simple to implement. I'm still wondering if it's already been packaged into some module.
Form path and meaning or relations of elements in hierarchy of URL is not specified in standard. Depending on server there could be no hierarchy at all - elements split by / could be treated as positional or order could have no meaning at all. Because of that, there's no specific module to handle that task for URLs.
However, if you're absolutely sure about how target server works, you can simply adapt File::Spec to your needs: extract path from URL (for example with URI), process it as it would be a file path, and then put it back.
Considering your comment that you'll be working with regular file names on file system, you don't even need to extract anything from path - File::Spec is enough for all your needs.
If you wish to work around File::Spec (by design) not resolving .., use splitpath from it to extract directory part of name, splitdir to split it to directories and then just iterate of that array, splice'ing two elements each time you encounter ... Use catdir and catfile to pack results back.

Delete multiple files with names containing a substring efficiently

I would like to delete multiple files that contain a substring. Say for example I would like to delete all the files that has the substring my. Assume that my directory contains 4 files: photo.jpg, myPhoto.jpg, beachMyPhoto.jpg, anyPhoto.jpg, since the term of search is my the files that I am interested to delete are myPhoto.jpg and beachMyPhoto.jpg (case insensitive).
My proposed solution (which I know how to do) is to use NSFileManager class, and use the function contentsOfDirectoryAtPath:error: to read all the directory contents, and then search by a loop for a hit. If a hit is found I delete that file.
What I don't like in my proposed solution is that it is not that efficient especially if the directory contains too many files and the hit is a small number. Is there a more efficient way to do this?
If you don't want a big array loaded into memory, you can try -[NSFileManager enumeratorAtURL:includingPropertiesForKeys:options:errorHandler:]. Since you only want the immediate contents of the directory, you would invoke -[NSDirectoryEnumerator skipDescendants] for each directory that it returns.
If your concern is iterating over all of the items in the directory, testing for your match pattern, well that's unavoidable. Any technique you would hope to use has to somehow iterate over all of the items in the directory and test for a match. The only question is whether that iteration is exposed to you or not. In Cocoa, it is. You could drop down to the glob() function if you want an alternative where it isn't.

Difference in the paths in .gitignore file?

I've been using git but still having confusion about the .gitignore file paths.
So, what is the difference between the following two paths in .gitignore file?
tmp/*
public/documents/**/*
I can understand that tmp/* will ignore all the files and folders inside it. Am I right?
But what does that second line path mean?
This depends on the behavior of your shell. Git doesn't do any work to determine how to expand these. In general, * matches any single file or folder:
/a/*/z
matches /a/b/z
matches /a/c/z
doesn't match /a/b/c/z
** matches any string of folders:
/a/**/z
matches /a/b/z
matches /a/b/c/z
matches /a/b/c/d/e/f/g/h/i/z
doesn't match /a/b/c/z/d.pr0n
Combine ** with * to match files in an entire folder tree:
/a/**/z/*.pr0n
matches /a/b/c/z/d.pr0n
matches /a/b/z/foo.pr0n
doesn't match /a/b/z/bar.txt
Update (08-Mar-2016)
Today, I am unable to find a machine where ** does not work as claimed. That includes OSX-10.11.3 (El Capitan) and Ubuntu-14.04.1 (Trusty). Possibly git-ignore as been updated, or possibly recent fnmatch handles ** as people expect. So the accepted answer now seems to be correct in practice.
Original post
The ** has no special meaning in git. It is a feature of bash >= 4.0, via
shopt -s globstar
But git does not use bash. To see what git actually does, you can experiment with git add -nv and files in several levels of sub-directories.
For the OP, I've tried every combination I can think of for the .gitignore file, and nothing works any better than this:
public/documents/
The following does not do what everyone seems to think:
public/documents/**/*.obj
I cannot get that to work no matter what I try, but at least that is consistent with the git docs. I suspect that when people add that to .gitignore, it works by accident, only because their .obj files are precisely one sub-directory deep. They probably copied the double-asterisk from a bash script. But perhaps there are systems where fnmatch(3) can handle the double-asterisk as bash can.
If you're using a shell such as Bash 4, then ** is essentially a recursive version of *, which will match any number of subdirectories.
This makes more sense if you add a file extension to your examples. To match log files immediately inside tmp, you would type:
/tmp/*.log
To match log files anywhere in any subdirectory of tmp, you would type:
/tmp/**/*.log
But testing with git version 1.6.0.4 and bash version 3.2.17(1)-release, it appears that git does not support ** globs at all. The most recent man page for gitignore doesn't mention **, either, so this is either (1) very new, (2) unsupported, or (3) somehow dependent on your system's implementation of globbing.
Also, there's something subtle going on in your examples. This expression:
tmp/*
...actually means "ignore any file inside a tmp directory, anywhere in the source tree, but don't ignore the tmp directories themselves". Under normal circumstances, you'd probably just write:
/tmp
...which would ignore a single top-level tmp directory. If you do need to keep the tmp directories around, while ignoring their contents, you should place an empty .gitignore file in each tmp directory to make sure that git actually creates the directory.
Note that the '**', when combined with a sub-directory (**/bar), must have changed from its default behavior, since the release note for git1.8.2 now mentions:
The patterns in .gitignore and .gitattributes files can have **/, as a pattern that matches 0 or more levels of subdirectory.
E.g. "foo/**/bar" matches "bar" in "foo" itself or in a subdirectory of "foo".
See commit 4c251e5cb5c245ee3bb98c7cedbe944df93e45f4:
"foo/**/bar" matches "foo/x/bar", "foo/x/y/bar"... but not "foo/bar".
We make a special case, when foo/**/ is detected (and "foo/" part is already matched), try matching "bar" with the rest of the string.
"Match one or more directories" semantics can be easily achieved using "foo/*/**/bar".
This also makes "**/foo" match "foo" in addition to "x/foo", "x/y/foo"..
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds#gmail.com>
Simon Buchan also commented:
current docs (.gitignore man page) are pretty clear that no subdirectory is needed, x/** matches all files under (possibly empty) x
The .gitignore man page does mention:
A trailing "/**" matches everything inside. For example, "abc/**" matches all files inside directory "abc", relative to the location of the .gitignore file, with infinite depth.
A slash followed by two consecutive asterisks then a slash matches zero or more directories. For example, "a/**/b" matches "a/b", "a/x/b", "a/x/y/b" and so on.
When ** isn't supported, the "/" is essentially a terminating character for the wildcard, so when you have something like:
public/documents/**/*
it is essentially looking for two wildcard items in between the slashes and does not pick up the slashes themselves. Consequently, this would be the same as:
public/documents/*/*
It doesn't work for me but you could create a new .gitignore in that subdirectory:
tmp/**/*.log
can be replaced by a .gitignore in tmp:
*.log