Why does "find . -name *.txt | xargs du -hc" give multiple totals? - find

I have a large set of directories for which I'm trying to calculate the sum total size of several hundred .txt files. I tried this, which mostly works:
find . -name *.txt | xargs du -hc
But instead of giving me one total at the end, I get several. My guess is that the pipe will only pass on so many lines of find's output at a time, and du just operates on each batch as it comes. Is there a way around this?
Thanks!
Alex

How about using the --files0-from option to du? You'd have to generate the null-terminated file output appropriately:
find . -name "*txt" -exec echo -n -e {}"\0" \; | du -hc --files0-from=-
works correctly on my system.

find . -print0 -iname '*.txt' | du --files0-from=-
and if you want to have several different extensions to search for it's best to do:
find . -type f -print0 | grep -azZEi '\.(te?xt|rtf|docx?|wps)$' | du --files0-from=-

The xargs program breaks things up into batches, to account for the limits due to the maximum length of a unix command line. It's still more efficient than running your subcommand one at a time but, for a long list of inputs, it will run the command enough times that each "run" is short enough that it won't cause issues.
Because of this, you're likely seeing one output line per "batch" that xargs needs to run.
Because you may find it useful/interesting, the man page can be found online here: http://unixhelp.ed.ac.uk/CGI/man-cgi?xargs
One other thing to note (and this may be a typo in your post or my misunderstanding) is that you have the "*.txt" unescaped/quoted. Ie, you have
find . -name *.txt | xargs du -hc
where you probably want
find . -name \*.txt | xargs du -hc
The difference being that the command line may be expanding the * into the list of filenames that match... rather than passing the * into find, which will use it as a pattern.

Another simple solution:
find . -name *.txt -print0 | xargs -0 du -hc

One alternate solution is to use bash for loop:
for i in `find . -name '*.txt'`; do du -hc $i | grep -v 'total'; done
This is good for when you need more control of what happens in the loop.

xargs busts its input into reasonable-sized chunks - what you're seeing are totals for each of those chunks. Check the man page for xargs on ways to configure its handling of input.

One alternate solution is to use awk:
find . -name "*.txt" -exec ls -lt {} \; | awk -F " " 'BEGIN { sum=0 } { sum+=$5 } END { print sum }'

Related

find command for the newest 500 files in a directory tree and also be POSIX compliant

I'm looking for a single line shell script or unix command to find the newest 500 files in a directory tree. Major constraints are it should be POSIX compliant and the directory can have tons of files.
I found from the below link a perl script which helped:
find . -type f -print | perl -l -ne ' $_{$_} = -M; END { $,="\n"; print sort {$_{$b} <=> $_{$a}} keys %_ }' | head -n 500
How to recursively find and list the latest modified files in a directory with subdirectories and times?
Any more comments most welcome.Thanks all.
How about this:
Posix ls and head
ls -tc DIR | head -n 500
find . -type f -print | perl -l -ne ' ${$} = -M; END { $,="\n"; print sort {${$b} <=> ${$a}} keys %_ }' | head -n 500
It should be the contrary for the sort ${$a} <=> ${$b}
The head can be avoided: print+(...)[0..499]
The find too with a recursive call:
perl -e 'sub R{($_)=#_;map{-d$_?&R($_):$_}<$_/*>}print$_,$/for(sort{-M$a<=>-M$b}R".")[0..499]'
Or with an unix cmd: not sure if there are to many arguments may fail
find . -type f -exec ls -1t {} + | head -500
find . -type f -print0 | xargs -0 ls -1t | head -500
find . -type f -exec stat -c %Y:%n {} \; |
sort -rn | sed -e 's/.*://' -e 500q
This sorts on ctime, which can be changed by using %Z or %X in the format string, but stat is not POSIX.
There is no 100% reliable POSIX way of doing this with shell scripting.
A POSIX C program will do it easily though, assuming you define newest by either last modified file content or last changed file. If you mean last creation time, there is no POSIX way and possibly no solution at all, depending on the file system used.

Using sed to grab filename from full path?

I'm new to sed, and need to grab just the filename from the output of find. I need to have find output the whole path for another part of my script, but I want to just print the filename without the path. I also need to match starting from the beginning of the line, not from the end. In english, I want to match, the first group of characters ending with ".txt" not containing a "/". Here's my attempt that doesn't work:
ryan#fizz:~$ find /home/ryan/Desktop/test/ -type f -name \*.txt
/home/ryan/Desktop/test/two.txt
/home/ryan/Desktop/test/one.txt
ryan#fizz:~$ find /home/ryan/Desktop/test/ -type f -name \*.txt | sed s:^.*/[^*.txt]::g
esktop/test/two.txt
ne.txt
Here's the output I want:
two.txt
one.txt
Ok, so the solutions offered answered my original question, but I guess I asked it wrong. I don't want to kill the rest of the line past the file suffix i'm searching for.
So, to be more clear, if the following:
bash$ new_mp3s=\`find mp3Dir -type f -name \*.mp3\` && cp -rfv $new_mp3s dest
`/mp3Dir/one.mp3' -> `/dest/one.mp3'
`/mp3Dir/two.mp3' -> `/dest/two.mp3'
What I want is:
bash$ new_mp3s=\`find mp3Dir -type f -name \*.mp3\` && cp -rfv $new_mp3s dest | sed ???
`one.mp3' -> `/dest'
`two.mp3' -> `/dest'
Sorry for the confusion. My original question just covered the first part of what I'm trying to do.
2nd edit:
here's what I've come up with:
DEST=/tmp && cp -rfv `find /mp3Dir -type f -name \*.mp3` $DEST | sed -e 's:[^\`].*/::' -e "s:$: -> $DEST:"
This isn't quite what I want though. Instead of setting the destination directory as a shell variable, I would like to change the first sed operation so it only changes the cp output before the "->" on each line, so that I still have the 2nd part of the cp output to operate on with another '-e'.
3rd edit:
I haven't figured this out using only sed regex's yet, but the following does the job using Perl:
cp -rfv `find /mp3Dir -type f -name \*.mp3` /tmp | perl -pe "s:.*/(.*.mp3).*\`(.*/).*.mp3\'$:\$1 -> \$2:"
I'd like to do it in sed though.
Something like this should do the trick:
find yourdir -type f -name \*.txt | sed 's/.*\///'
or, slightly clearer,
find yourdir -type f -name \*.txt | sed 's:.*/::'
Why don't you use basename instead?
find /mydir | xargs -I{} basename {}
No need external tools if using GNU find
find /path -name "*.txt" -printf "%f\n"
I landed on the question based on the title: using sed to grab filename from fullpath.
So, using sed, the following is what worked for me...
FILENAME=$(echo $FULLPATH | sed -n 's/^\(.*\/\)*\(.*\)/\2/p')
The first group captures any directories from the path. This is discarded.
The second group capture is the text following the last slash (/). This is returned.
Examples:
echo "/test/file.txt" | sed -n 's/^\(.*\/\)*\(.*\)/\2/p'
file.txt
echo "/test/asd/asd/entrypoint.sh" | sed -n 's/^\(.*\/\)*\(.*\)/\2/p'
entrypoint.sh
echo "/test/asd/asd/default.json" | sed -n 's/^\(.*\/\)*\(.*\)/\2/p'
default.json
find /mydir | awk -F'/' '{print $NF}'
path="parentdir2/parentdir1/parentdir0/dir/FileName"
name=${path##/*}

Odd Sed Error Message

bash-3.2$ sed -i.bakkk -e "s#/sa/#/he/#g" .*
sed: .: in-place editing only works for regular files
I try to replace every /sa/ with /he/ in every dot-file in a folder. How can I get it working?
Use find -type f to find only files matching the name .* and exclude the directories . and ... -maxdepth 1 prevents find from recursing into subdirectories. You can then use -exec to execute the sed command, using a {} placeholder to tell find where the file names should go.
find . -type f -maxdepth 1 -name '.*' -exec sed -i.bakkk -e "s#/sa/#/he/#g" {} +
Using -exec is preferable over using backticks or xargs as it'll work even on weird file names containing spaces or even newlines—yes, "foo bar\nfile" is a valid file name. An honorable mention goes to find -print0 | xargs -0
find . -type f -maxdepth 1 -name '.*' -print0 | xargs -0 sed -i.bakkk -e "s#/sa/#/he/#g"
which is equally safe. It's a little more verbose, though, and less flexible since it only works for commands where the file names go at the end (which is, admittedly, 99% of them).
Try this one:
sed -i.bakkk -e "s#/sa/#/he/#g" `find .* -type f -maxdepth 0 -print`
This should ignore all directories (e.g., .elm, .pine, .mozilla) and not just . and .. which I think the other solutions don't catch.
The glob pattern .* includes the special directories . and .., which you probably didn't mean to include in your pattern. I can't think of an elegant way to exclude them, so here's an inelegant way:
sed -i.bakkk -e "s$/sa/#/he/#g" $(ls -d .* | grep -v '^\.\|\.\.$')

How can I traverse a directory tree using a bash or Perl script?

I am interested into getting into bash scripting and would like to know how you can traverse a unix directory and log the path to the file you are currently looking at if it matches a regex criteria.
It would go like this:
Traverse a large unix directory path file/folder structure.
If the current file's contents contained a string that matched one or more regex expressions,
Then append the file's full path to a results text file.
Bash or Perl scripts are fine, although I would prefer how you would do this using a bash script with grep, awk, etc commands.
find . -type f -print0 | xargs -0 grep -l -E 'some_regexp' > /tmp/list.of.files
Important parts:
-type f makes the find list only files
-print0 prints the files separated not by \n but by \0 - it is here to make sure it will work in case you have files with spaces in their names
xargs -0 - splits input on \0, and passes each element as argument to the command you provided (grep in this example)
The cool thing with using xargs is, that if your directory contains really a lot of files, you can speed up the process by paralleling it:
find . -type f -print0 | xargs -0 -P 5 -L 100 grep -l -E 'some_regexp' > /tmp/list.of.files
This will run the grep command in 5 separate copies, each scanning another set of up to 100 files
use find and grep
find . -exec grep -l -e 'myregex' {} \; >> outfile.txt
-l on the grep gets just the file name
-e on the grep specifies a regex
{} places each file found by the find command on the end of the grep command
>> outfile.txt appends to the text file
grep -l -R <regex> <location> should do the job.
If you wanted to do this from within Perl, you can take the find commands that people suggested and turn them into a Perl script with find2perl:
If you have:
$ find ...
make that
$ find2perl ...
That outputs a Perl program that does the same thing. From there, if you need to do something that easy in Perl but hard in shell, you just extend the Perl program.
find /path -type f -name "*.txt" | awk '
{
while((getline line<$0)>0){
if(line ~ /pattern/){
print $0":"line
#do some other things here
}
}
}'
similar thread
find /path -type f -name "outfile.txt" | awk '
{
while((getline line<$0)>0){
if(line ~ /pattern/){
print $0":"line
}
}
}'

using find command to search for all files having some text pattern

I use following find command to find and show all files having the input text pattern.
find . -type f -print|xargs grep -n "pattern"
I have many project folders each of which has its own makefile named as 'Makefile'.(no file extension, just 'Makefile')
How do i use above command to search for a certain pattern only in the files named Makefile which are present in all my project folders?
-AD.
-print is not required (at least by GNU find implementation). -name argument allows to specify filename pattern. Hence the command would be:
find . -name Makefile | xargs grep pattern
If you have spaces or odd characters in your directory paths youll need to use the null-terminated method:
find . -name Makefile -print0 | xargs -0 grep pattern
find . -type f -name 'Makefile' | xargs egrep -n "pattern"
use egrep if you have very long paths
Duplicate of : this
You can avoid the use of xargs by using -exec:
find . -type f -name 'Makefile' -exec egrep -Hn "pattern" {} \;
-H on egrep to output the full path to the matching files.
grep -R "string" /path
Please find this link
http://rulariteducation.blogspot.in/2016/03/how-to-check-particluar-string-in-linux.html
you can use ff command i.e ff -p .format. For eg ff -p *.txt
Find big files occupying large disk space
we need to combine multiple command .
find . -type f | xargs du -sk | sort -n | tail;