find - globbing in path to search - find

My question is simple and I couldn't find answer on google:
why if I type:
find *h
or
find *g
or any other character following the star, the result is all files in current and subdirectories ?
the same result is also for
find *
which is obvious. I guess the star(*) acts here as the directory where to start searching, not the file pattern to search for. So the * extends as 'all directories in current directory'. So in this case it will search in all directories and find all files, which is the expected behavior. But why if I provide as directory to start searching '*g' it finds also all files ? even though there is no single directory which starts with 'g' ?

What you are describing is not how it works. *g is expanded by the shell to all the files and directories in the current directory which end with g and then find acts on that list.
As #Barmar points out in a comment, what you describe sounds like you have no matches on *g and the nullglob option set in your shell, which will cause a wildcard expression with no matches to expand into the empty string. (The default behavior is to leave it unexpanded, which would cause an error message from find.)

Related

VS Code: Search multiple directories

When searching, VS Code has the ability to list files to include to scope the search. This is used by default when using the "find in folder" feature. For example, searching src results in ./src as the files to include.
Is there a syntax I can use to list multiple directories here? For example, I want to search ./src and ./lib in one search.
Did you try a comma like ./dir1, ./dir2? For me it seems to work
By the way, here is the documentation of 'files to include': https://code.visualstudio.com/Docs/editor/codebasics#_advanced-search-options
In particular, you can use glob notation. Also, VS Code will include/exclude certain directories or files by default, depending on your settings.json, in case anyone still sees unexpected behaviour.

vifm search files in subfolders

How can I search files just like with / command but recursively scanning subfolders?
Or maybe there are other approaches to get a list of files that match some pattern in the current folder including all subfolders.
:find command
There is :fin[d] command for that. Internally it invokes find utility (this is configurable via 'findprg' option), so you can do everything find is capable of. That said, in most cases the simple form of the command suffices:
:find *.sh
Note that by default argument is treated as regular file pattern (-name option of find), which is different from regular expressions accepted by /. For searching via regexp, use:
:find -regex '.*_.*'
If you want to scan only specific subfolders, just select them before running the command and search will be limited only to those directories.
:find command brings up a menu with search results. If you want to process them like regular files (e.g. delete, copy, move), hit b to change list representation.
Alternative that uses /
Alternatively you can populate current view with list of files in all subdirectories with command like (see %u):
:!find%u
and then use /, although this might be less efficient.

How to search a text among c files under a directory

I've looked through several similar questions, but either I didn't understand their answer or my question is different than theirs. So, I have a project contains many subdirecties and different type of files. I would like to search a function name among those .C files only.
Some information on the web suggest to use "Esc x dired-do-query-replace-regexp". However, this will search not just C files, but also other file like .elf which isn't helpfule in my case. Other people sugget to use TAG function, but it will require me to type "etags *.c" for every subdirectory which is also impossible.
How should I do this while working on those large scale software project?
Thanks
Lee
Use ack-grep on linux
ack-grep "keyword" -G *.c
My favorite: igrep-find, found in the package igrep.el. Usage is:
M-x igrep-find some_thing RET *.C
There's the built in grep-find, docs here, but I find it awkward to use.
For a more general answer, see this similar question: Using Emacs For Big Big Projects.
if you're on linux, you can use grep to find files with a certain text in them. you would then do this outside of emacs, in your shell/command prompt. here's a nice syntax:
grep --color=auto --include=*.c -iRnH 'string to search for' /dir/to/search/
the directory to search can be specified relative, so if you're in the directory you want to use as the root directory for your recursive search, you can just skip the whole directory address and specify a single dot.
grep --color=auto --include=*.c -iRnH 'string to search for' .
the part --color=auto makes some text highlighted. --include=*.c is the part that specifies what files to search. in this case, only files with the c-extension. the flag i makes stuff case insensitive, the flag R makes the search recursive, the flag n adds the line number to the report, and the flag H adds the file path to the report.
To breed find and grep there is find-grep function, there you can change the invocation string to find . -name *.c etc. Make it a function, if You like. Then You use eg. C-x` et al. to navigate the results.
To search among the files in one directory i use lgrep, it prompts you in which files to search.
You can use cscope and xcscope.el : http://www.emacswiki.org/emacs/CScopeAndEmacs
Try with dired: place the cursor on the directory name to search, type A and in the minibuffer the text to find.

Difference in the paths in .gitignore file?

I've been using git but still having confusion about the .gitignore file paths.
So, what is the difference between the following two paths in .gitignore file?
tmp/*
public/documents/**/*
I can understand that tmp/* will ignore all the files and folders inside it. Am I right?
But what does that second line path mean?
This depends on the behavior of your shell. Git doesn't do any work to determine how to expand these. In general, * matches any single file or folder:
/a/*/z
matches /a/b/z
matches /a/c/z
doesn't match /a/b/c/z
** matches any string of folders:
/a/**/z
matches /a/b/z
matches /a/b/c/z
matches /a/b/c/d/e/f/g/h/i/z
doesn't match /a/b/c/z/d.pr0n
Combine ** with * to match files in an entire folder tree:
/a/**/z/*.pr0n
matches /a/b/c/z/d.pr0n
matches /a/b/z/foo.pr0n
doesn't match /a/b/z/bar.txt
Update (08-Mar-2016)
Today, I am unable to find a machine where ** does not work as claimed. That includes OSX-10.11.3 (El Capitan) and Ubuntu-14.04.1 (Trusty). Possibly git-ignore as been updated, or possibly recent fnmatch handles ** as people expect. So the accepted answer now seems to be correct in practice.
Original post
The ** has no special meaning in git. It is a feature of bash >= 4.0, via
shopt -s globstar
But git does not use bash. To see what git actually does, you can experiment with git add -nv and files in several levels of sub-directories.
For the OP, I've tried every combination I can think of for the .gitignore file, and nothing works any better than this:
public/documents/
The following does not do what everyone seems to think:
public/documents/**/*.obj
I cannot get that to work no matter what I try, but at least that is consistent with the git docs. I suspect that when people add that to .gitignore, it works by accident, only because their .obj files are precisely one sub-directory deep. They probably copied the double-asterisk from a bash script. But perhaps there are systems where fnmatch(3) can handle the double-asterisk as bash can.
If you're using a shell such as Bash 4, then ** is essentially a recursive version of *, which will match any number of subdirectories.
This makes more sense if you add a file extension to your examples. To match log files immediately inside tmp, you would type:
/tmp/*.log
To match log files anywhere in any subdirectory of tmp, you would type:
/tmp/**/*.log
But testing with git version 1.6.0.4 and bash version 3.2.17(1)-release, it appears that git does not support ** globs at all. The most recent man page for gitignore doesn't mention **, either, so this is either (1) very new, (2) unsupported, or (3) somehow dependent on your system's implementation of globbing.
Also, there's something subtle going on in your examples. This expression:
tmp/*
...actually means "ignore any file inside a tmp directory, anywhere in the source tree, but don't ignore the tmp directories themselves". Under normal circumstances, you'd probably just write:
/tmp
...which would ignore a single top-level tmp directory. If you do need to keep the tmp directories around, while ignoring their contents, you should place an empty .gitignore file in each tmp directory to make sure that git actually creates the directory.
Note that the '**', when combined with a sub-directory (**/bar), must have changed from its default behavior, since the release note for git1.8.2 now mentions:
The patterns in .gitignore and .gitattributes files can have **/, as a pattern that matches 0 or more levels of subdirectory.
E.g. "foo/**/bar" matches "bar" in "foo" itself or in a subdirectory of "foo".
See commit 4c251e5cb5c245ee3bb98c7cedbe944df93e45f4:
"foo/**/bar" matches "foo/x/bar", "foo/x/y/bar"... but not "foo/bar".
We make a special case, when foo/**/ is detected (and "foo/" part is already matched), try matching "bar" with the rest of the string.
"Match one or more directories" semantics can be easily achieved using "foo/*/**/bar".
This also makes "**/foo" match "foo" in addition to "x/foo", "x/y/foo"..
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds#gmail.com>
Simon Buchan also commented:
current docs (.gitignore man page) are pretty clear that no subdirectory is needed, x/** matches all files under (possibly empty) x
The .gitignore man page does mention:
A trailing "/**" matches everything inside. For example, "abc/**" matches all files inside directory "abc", relative to the location of the .gitignore file, with infinite depth.
A slash followed by two consecutive asterisks then a slash matches zero or more directories. For example, "a/**/b" matches "a/b", "a/x/b", "a/x/y/b" and so on.
When ** isn't supported, the "/" is essentially a terminating character for the wildcard, so when you have something like:
public/documents/**/*
it is essentially looking for two wildcard items in between the slashes and does not pick up the slashes themselves. Consequently, this would be the same as:
public/documents/*/*
It doesn't work for me but you could create a new .gitignore in that subdirectory:
tmp/**/*.log
can be replaced by a .gitignore in tmp:
*.log

Why do directory listings contain the current (.) and parent (..) directory?

Whenever I list the contents of a directory with a function like readdir, the returned file names also include "." and "..". I have the suspicion that these are just normal links in the file system and therefore indistinguishable from actual files, but I always have to filter them out because they are not actual objects in the directory I am listing. Is there a good reason for functions like readdir to include them? Do some operating systems or file systems contain more or different virtual file names? Is there a better way to filter them out other than by doing string comparison with "." and ".."?
Update: thank you all for answering. I suppose I always thought that things like ./ and ../ were mere conventions that could be handled by searching and replacing. I find it a bit surprising, though probably more efficient and transparent, to have them be part of the file system itself.
One question remains, though: since . and .. are arbitrary names for these links, are there file systems that use different ones?
. and .. are actually hard links in filesystems. They are needed so that you can specify relative paths, based on some reference path (consider "../sibling/file.txt"). Since these hard links are actually existing in the filesystem, it makes sense for readdir to tell you about them. (actually the term hard link just means some name that is indistinguishable from the actual directory referred to: they both point to the same inode in the filesystem).
Best way is to just strcmp and ignore them, if you don't want to list them.
Originally they were hard links, and the number of special cases in the filesystem code for . and .. were minimal. That's not true for all modern filesystems, however.
But the conventions have been established so that even filesystems where these two directory entries don't actually exist still report their existence through APIs like readdir. Changing this would now would break a lot of code.
I have the suspicion that these are
just normal links in the file system
and therefore indistinguishable from
actual files
They are. While you may perceive the file system as a hierarchy of "folders" "containing" folders, it is actually a doubly linked tree1, with directories being nodes and files being leafs. So, . and .. are needed links for accessing the leaves of the current node and for traversing the tree, and they are the same thing as all the other links.
When you call readdir, you get all the places you can directly go to from the current node. If you do not want to list places that you perceive as "up", you have to sort them out yourself. You should write a little function for that, perhaps called readdir_down. I do not know in which order readdir lists the directories, but perhaps you can just throw away the first two entries.
1) this is a first approximation, there are also "hard links" possible that make the tree actually a net.
One reason is that without them there is no way to get to the parent directory. Or get a handle to the current directory.
Without them, we cannot do such things as:
./run_this
Indeed, we couldn't add '.' to the $PATH, meaning we couldn't ever execute files that weren't already in the path.
These are normal directories, they are "hard links" to the current directory and directory above. They are present in all directories (even at the root level, where .. is exactly the same as .).
When using ls, you can filter out . and .. with ls -A (note the capital -A).
When applying a command to all dot-files, but not . or .., I often use .??* which matches only dot-file with a name of three characters or more.
touch .??*
Note this pattern also excludes any other file that begins with dot and is only two characters long (e.g. .x) but those files are uncommon.
When using programmatic file-listers like readdir() I do have to exclude . and .. manually. Since these two files are supposed to be first in the list returned by readdir() you can do this:
#files = readdir(DIR);
for (1..2) { shift #files; } # get rid of . and ..
# go on with your business
They are reported because they are stored in the directory listing. That's the way unices have always worked.
Because on Unix-like operating systems, the directory-listing commands include those, and you use them to move up and down in the filesystem hierarchy.
Something like grep { not /^.{1,2}\z/ } readdir HANDLE should work for you.
there is no good reason a directory scan should return these filenames.