find + cp sparse directory tree - find

I have a directory tree which, among other files, has files which match certain patterns. For the sake of the discussion, let's assume these are files matching *.foo, or *.bar, or baz*. I want to backup inside my zsh-script only files matching these pattern to a new directory.
The seemingly obvious solution,
find fromdir \( -name '*.{foo,bar}' -o -name 'baz*' \) -exec cp {} todir \;
does not work, because the destination directory for, i.e., fromdir/x/y/a.foo does not exist.
I was thinking of using rsync, but I know only how to exclude certain files from being copied, not how to restrict copying.
I can solve the problem by writing a small auxiliary script, mdcp1file, like this:
#!/bin/zsh
set -u
mkdir -p $2/$1:h # Create destination directory if needed
cp $1 $2
and use it in my find command instead of cp. I wonder whether there is an easier way to solve this problem, either by beefing up the -exec of my find, or by using rsync in a clever way.

As you mention that you make use of zsh, you could just do something like this:
cd /path/to/source/dir
cp --parents **/{*.{foo,bar},baz*}(.) /path/to/destination/dir
Here we make use of:
cp --parents: Bash: Copy named files recursively, preserving folder structure
**: for matching over multiple directories
BRACE EXPANSION: A string of the form foo{xx,yy,zz}bar is expanded to the individual words fooxxbar, fooyybar and foozzbar. Left-to-right order is preserved. This construct may be nested. Commas may be quoted in order to include them literally in a word.
Glob Qualifier (.): Patterns used for filename generation may end in a list of qualifiers enclosed in parentheses. The qualifiers specify which filenames that otherwise match the given pattern will be inserted in the
argument list. The . selects files only.

Related

Git Bash find exec recursively on folders and files containing spaces

Question: In Git Bash on windows, how would you run the following in a way that it will also search folders with spaces in the name, and execute on files with spaces in the name?
$ find ./ -type f -name '*.png' -exec sh -c 'cwebp -q 75 $1 -o "${1%.png}.webp"' _ {} \;
Context I'm running Git Bash on windows, trying to execute a command on all found .png files to convert them to .webp format. It works for all files without spaces in the path, but it's failing to find files with spaces in the filename or files within folders that have spaces in the folder name.A few considerations:
I have many, many levels of folders to iterate through, and I can't run this command separately for each. I really need the recursion to work.I cannot change the folder names; it will break other dependencies (nor did I create the folder or filenames originally, so cut me some slack!)I arrived here by following the suggestions from this article: https://www.smashingmagazine.com/2018/07/converting-images-to-webp/the program, to my knowledge, doesn't ship with any built-in recursive command... golly that'd be handy
Any help you can provide will be appreciated. Thanks!

How to run a command in a folder and subfolder

I have a large file folder structure with many levels (without a pattern in naming convention). How do I run the following command to extract the data from all the folders? the command is:
perl -wne'while(/[\w\.\-]+#[\w\.\-]+\w+/g){print "$&\n"}'inputfile.txt > outputfile.txt
It works for one input file, but want it to go through all the text files in folders and subfolders.
I'd use find to call Perl with the "-i" option for in-place editing. With the "-i" option, you can optionally specify an extension for the saved unmodified file; without it, it modifies the file in-place without saving the unmodified file.
find dirs -name \*.txt -exec perl -i.orig -wne 'while(/[\w\.\-]+#[\w\.\-]+\w+/g){print "$&\n"}' {} \;
or (to start up Perl less often) use:
find dirs -name \*.txt -print | xargs perl -i.orig -wne 'while(/[\w\.\-]+#[\w\.\-]+\w+/g){print "$&\n"}'
Alternatively, you can use the File::Find module to walk the directory tree and then do your own in-place editing, but I think the above method is easier if you are on UNIX/Linux. (If on Windows, you might have to go this way.)

What order does find(1) list files in?

On extfs, if there are only file-creations and no -deletions in a directory, I expect that find . -type f would list the files either in their chronological order of creation (or mtime), or if not, at least in their reverse chronological order... depending on how a directory's contents are traversed.
But that isn't the behavior I'm seeing.
The following code, eg, creates a fresh set of directories and files:
#!/bin/bash -u
for i in a/ a/{1,2,3,4,5} b/ b/{1,2,3,4,5}; do
if echo "$i" | egrep -q "/$"; then
echo "Creating dir $i"
mkdir -p "$i"
else
echo "Creating file $i"
touch "$i"
fi
sleep 0.500
done
Output of the above snippet:
Creating dir a/
Creating file a/1
Creating file a/2
Creating file a/3
Creating file a/4
Creating file a/5
Creating dir b/
Creating file b/1
Creating file b/2
Creating file b/3
Creating file b/4
Creating file b/5
However, find lists files in somewhat random order. For example, a/2 doesn't follows a/1, and b/2 doesn't follow b/1:
$ find . -type f
./a/1
./a/3
./a/4
./a/2 <----
./a/5
./b/1
./b/3
./b/4
./b/2 <----
./b/5
Any idea why this should happen?
My main problem is: I have a very large volume storing 100s of 1000s of files. I need to traverse these files and directories in the order of their creation/modification (mtime) and pipe each file to another process for further processing. But I don't necessarily want to first create a temporary list of this large set of files and then sort it based on mtime before piping it to my process.
find lists objects in the order that they are reported by the underlying filesystem implementation. You can tell ls to show you this "raw" order by passing it the -f option.
The order could be anything at all -- alphabetical, by mtime, by atime, by length of name, by permissions, or something completely different. The ordering can even vary from one listing to the next.
It's common for filesystems to report in an order that reflects the filesystem's strategy for allocating directory slots to files. If this is some sort of hash-based strategy based on filename then the order can appear nonsensical. This is what happens with widely-used Linux and BSD filesystem implementations. Since you mention extfs this is probably what causes the ordering you're seeing.
So, if you need the output from find to be ordered in a particular way then you'll have to create that order yourself. Maybe based on something like:
find . -type f -exec ls -ltr --time-style=+%s {} \; | sort -n -k6

How do I do a recursive find & replace within an SVN checkout?

How do I find and replace every occurrence of:
foo
with
bar
in every text file under the /my/test/dir/ directory tree (recursive find/replace).
BUT I want to be able to do it safely within an SVN checkout and not touch anything inside the .svn directories
Similar to this but now with the SVN restriction: Awk/Sed: How to do a recursive find/replace of a string?
There are several possiblities:
Using find:
Using find to create a list of all files, and then piping them to sed or the equivalent, as suggested in the answer you reference, is fairly straightforward, and only requires scanning through the files once.
You'd use one of the same answers as from the question you referenced, but adding -path '*/.svn' -prune -o after the find . in order to prune out the SVN directories. See this question for a discussion of using the prune option with find -- although note that they've got the pattern wrong. Thus, to print out all the files, you would use:
find . -path '*/.svn' -prune -o -type f -print
Then, you can pipe that into an xargs call or whatever to do the individual replacements, as suggested in the question you referenced. There is a lot of discussion there about different options, which I won't reproduce here, although I prefer the version from John Zwinck's answer:
find . -path '*/.svn' -prune -o -type f -exec sed -i 's/foo/bar/g' {} +
Using recursive grep:
If you have a system with GNU grep, you can use that to find the list of files as well. This is probably less efficient than find, but it does allow you to only call sed on the files that match, and I personally find the syntax a lot easier to remember (or figure out from manpages):
sed -i 's/foo/bar/g' `grep -l -R --exclude-dir='*/.svn' 'foo' .`
The -l option causes grep to only output the list of file names, rather than the matching lines.
Using a GUI editor:
Alternately, if you're using windows, do what I do -- get a copy of the NoteTab editor (available in a free version), and use its search-and-replace-on-disk command, which ignores hidden .svn directories automatically and just works.
Edit: Corrected find pattern to */.svn instead of .svn, added more details and some other possibilities. However, this depends on your platform and svn version: .svn without */ may be required in some cases, like on CentOS 7.
How about this?
grep -i "search_string" `find "*.some_extension"`
That is halfway solution to finding a search_string within files that have a specific extension....once you know the files that has the string, can be easily modified by piping it into sed....

using grep and find commands - basic questions to help me sort it out in my simple mind

I am back with a second no-brainer question, but I would like to get this straight in my head.
I have an assignment in which I am charged with providing a command to find a file named test in my home directory (one command using find, and one using grep). I understand that using find is just 'find ~/test', but using grep, wouldn't I have to search out a pattern within the file 'test'? Or is there a way to search for the file (using grep), even if the file is empty?
ls ~ | grep test
I understand that using find is just 'find ~/test'
No. find ~/test will also have a match for every file or directory under the directory $HOME/test/. Rather use find ~ -type f -name test.
The assignment sounds unclear. But yes, if you give any filenames to grep, it will look at the contents of the files and ignore the names of the files. Perhaps you can grep the output of another command? Maybe ls as #Reese suggested, or maybe a different find command.
ls -R ~ | grep test
Explanation: ls -R ~ will recursively list all files and directories in your home folder. grep test will narrow down that list to files (and directories) that have "test" in their name.