Removing millions of files - oneliner

Removing millions of files - oneliner - perl

I want to remove millions of files in a directory and pages mentioned that the following Perl code is the fastest:
perl -e 'chdir "BADnew" or die; opendir D, "."; while ($n = readdir D) { unlink $n }`
However, is it also possible to do this on only files containing the word 'sorted'? Does anyone know how to rewrite this?

It can be done using find and grep combination:
find BADnew -type f -exec grep -q sorted {} \; -exec rm {} \;
Second -exec command will be executed only if return code for first one is zero.
You can do dry run:
find BADnew -type f -exec grep -q sorted {} \; -exec echo {} \;

the core module File::Find will recursively traverse all the subdirectories and perform a subroutine on all files found
perl -MFile::Find -e 'find( sub { open $f,"<",$_; unlink if grep /sorted/, <$f> }, "BADnew")'

Try:
find /where -type f -name \* -print0 | xargs -0 grep -lZ sorted | xargs -0 echo rm
#can search for specific ^^^ names ^^^^^^ ^^^^
# what should contain the file |
# remove the echo if satisfied with the result +
The above:
the find searches for files with a specified name (* - any)
the xargs ... grep list files what are contains the string
the xargs rm - removes the files
don't dies on "arg count too long"
the files could have whitespaces in their names
needs grep what knows the -Z
Also a variant:
find /where -type f -name \* -print0 | xargs -0 grep -lZ sorted | perl -0 -nle unlink

You haven't made it clear, despite specific questions, whether you require the file name or the file contents to contain sorted. Here are both solutions
First, chdir to the directory you're interested in. If you really need a one-liner for whatever reason then it is pointless to put the chdir inside the program.
cd BADnew
Then you can either unlink all nodes that are files and whose name contains sorted
perl -e'opendir $dh, "."; while(readdir $dh){-f and /sorted/ and unlink}'
or you can open each file and read it to see if its contents contain sorted. I hope it's clear that this method will be far slower, not least because you have to read the entire file to establish a negative. Note that this solution relies on the
perl -e'opendir $dh, "."; while(readdir $dh){-f or next; #ARGV=$f=$_; /sorted/ and unlink($f),last while <>}'

Related

Fish Shell: Delete All Except

Using Fish, how can I delete the contents of a directory except for certain files (or directories). Something like rm !(file1|file2) from bash, but fishier.

There is no such feature in fish - that's issue #1444.
You can do something like
rm (string match -rv '^file1$|^file2$' -- *)
Note that this will fail on filenames with newlines in them.
Or you can do the uglier:
set -l files *
for file in file1 file2
if set -l index (contains -i -- $file $files)
set -e files[$index]
end
end
rm $files
which should work no matter what the filenames contain.
Or, as mentioned in that issue, you can use find, e.g.
find . -mindepth 1 -maxdepth 1 -type f -a ! \( -name 'file1' -o -name 'file2' \)

find command for the newest 500 files in a directory tree and also be POSIX compliant

I'm looking for a single line shell script or unix command to find the newest 500 files in a directory tree. Major constraints are it should be POSIX compliant and the directory can have tons of files.

I found from the below link a perl script which helped:
find . -type f -print | perl -l -ne ' $_{$_} = -M; END { $,="\n"; print sort {$_{$b} <=> $_{$a}} keys %_ }' | head -n 500
How to recursively find and list the latest modified files in a directory with subdirectories and times?
Any more comments most welcome.Thanks all.

How about this:
Posix ls and head
ls -tc DIR | head -n 500

find . -type f -print | perl -l -ne ' ${$} = -M; END { $,="\n"; print sort {${$b} <=> ${$a}} keys %_ }' | head -n 500
It should be the contrary for the sort ${$a} <=> ${$b}
The head can be avoided: print+(...)[0..499]
The find too with a recursive call:
perl -e 'sub R{($_)=#_;map{-d$_?&R($_):$_}<$_/*>}print$_,$/for(sort{-M$a<=>-M$b}R".")[0..499]'
Or with an unix cmd: not sure if there are to many arguments may fail
find . -type f -exec ls -1t {} + | head -500
find . -type f -print0 | xargs -0 ls -1t | head -500

find . -type f -exec stat -c %Y:%n {} \; |
sort -rn | sed -e 's/.*://' -e 500q
This sorts on ctime, which can be changed by using %Z or %X in the format string, but stat is not POSIX.

There is no 100% reliable POSIX way of doing this with shell scripting.
A POSIX C program will do it easily though, assuming you define newest by either last modified file content or last changed file. If you mean last creation time, there is no POSIX way and possibly no solution at all, depending on the file system used.

unix find and replace text in dir and subdirs

I'm trying to change the name of "my-silly-home-page-name.html" to "index.html" in all documents within a given master directory and subdirs.
I saw this: Shell script - search and replace text in multiple files using a list of strings.
And this: How to change all occurrences of a word in all files in a directory
I have tried this:
grep -r "my-silly-home-page-name.html" .
This finds the lines on which the text exists, but now I would like to substitute 'my-silly-home-page-name' for 'index'.
How would I do this with sed or perl?
Or do I even need sed/perl?
Something like:
grep -r "my-silly-home-page-name.html" . | sed 's/$1/'index'/g'
?
Also; I am trying this with perl, and I try the following:
perl -i -p -e 's/my-silly-home-page-name\.html/index\.html/g' *
This works, but I get an error when perl encounters directories, saying "Can't do inplace edit: SOMEDIR-NAME is not a regular file, <> line N"
Thanks,
jml

find . -type f -exec \
perl -i -pe's/my-silly-home-page-name(?=\.html)/index/g' {} +
Or if your find doesn't support -exec +,
find . -type f -print0 | xargs -0 \
perl -i -pe's/my-silly-home-page-name(?=\.html)/index/g'
Both pass to Perl as arguments as many names at a time as possible. Both work with any file name, including those that contains newlines.
If you are on Windows and you are using a Windows build of Perl (as opposed to a cygwin build), -i won't work unless you also do a backup of the original. Change -i to -i.bak. You can then go and delete the backups using
find . -type f -name '*.bak' -delete

This should do the job:
find . -type f -print0 | xargs -0 sed -e 's/my-silly-home-page-name\.html/index\.html/g' -i
Basically it gathers recursively all the files from the given directory (. in the example) with find and runs sed with the same substitution command as in the perl command in the question through xargs.
Regarding the question about sed vs. perl, I'd say that you should use the one you're more comfortable with since I don't expect huge differences (the substitution command is the same one after all).

There are probably better ways to do this but you can use:
find . -name oldname.html |perl -e 'map { s/[\r\n]//g; $old = $_; s/oldname.txt$/newname.html/; rename $old,$_ } <>';
Fyi, grep searches for a pattern; find searches for files.

Current filename for find/xargs/perl -ne?

I'm working in the shell, trying to find NUL chars in a bunch of CSV files (that Python's CSV importer is whinging about, but that's for another time) using the so-proud-of-my-ever-clever-self:
find ~/path/ -name "*.csv" -print0 | \
xargs -n 1 -0 \
perl -ne 'if(m/\x{00}/){print fileno(ARGV).join(" ",#ARGV).$_;}'
Except I see no filename. Allegedly the implicit <> operator that perl -ne is wrapping my script in is just using #ARGV / the ARGV filehandle, but neither of the above is giving me the name of the current file.
How do I see the current filename (and, ideally, line number) in the above?

$ARGV is the name of the current file and $. is the current line number; see perldoc perlvar and I/O Operators in perldoc perlop. (Note that $. doesn't reset between files; there's discussion of that in perldoc -f eof.)
And I'm not entirely sure what you're trying to accomplish with that print; it will give you the filehandle number, which is probably 3, prepended to a space-separated list of filenames (which should probably be only the one because of xargs -n), then the current line which will include the NUL and other potentially terminal-confusing characters.

Pproceed something like this (I searched .pl files for "x" here):
find -type f -name \*.pl -print0 | \
xargs -0 \
perl -we 'while (<>) { print qq($ARGV\t$.\t$_) if m/x/ }'
And yes, it can be shortened using the -n switch:
find -type f -name \*.pl -print0 | \
xargs -0 \
perl -nwe 'print qq($ARGV\t$.\t$_) if m/x/'

How can I traverse a directory tree using a bash or Perl script?

I am interested into getting into bash scripting and would like to know how you can traverse a unix directory and log the path to the file you are currently looking at if it matches a regex criteria.
It would go like this:
Traverse a large unix directory path file/folder structure.
If the current file's contents contained a string that matched one or more regex expressions,
Then append the file's full path to a results text file.
Bash or Perl scripts are fine, although I would prefer how you would do this using a bash script with grep, awk, etc commands.

find . -type f -print0 | xargs -0 grep -l -E 'some_regexp' > /tmp/list.of.files
Important parts:
-type f makes the find list only files
-print0 prints the files separated not by \n but by \0 - it is here to make sure it will work in case you have files with spaces in their names
xargs -0 - splits input on \0, and passes each element as argument to the command you provided (grep in this example)
The cool thing with using xargs is, that if your directory contains really a lot of files, you can speed up the process by paralleling it:
find . -type f -print0 | xargs -0 -P 5 -L 100 grep -l -E 'some_regexp' > /tmp/list.of.files
This will run the grep command in 5 separate copies, each scanning another set of up to 100 files

use find and grep
find . -exec grep -l -e 'myregex' {} \; >> outfile.txt
-l on the grep gets just the file name
-e on the grep specifies a regex
{} places each file found by the find command on the end of the grep command
>> outfile.txt appends to the text file

grep -l -R <regex> <location> should do the job.

If you wanted to do this from within Perl, you can take the find commands that people suggested and turn them into a Perl script with find2perl:
If you have:
$ find ...
make that
$ find2perl ...
That outputs a Perl program that does the same thing. From there, if you need to do something that easy in Perl but hard in shell, you just extend the Perl program.

find /path -type f -name "*.txt" | awk '
{
while((getline line<$0)>0){
if(line ~ /pattern/){
print $0":"line
#do some other things here
}
}
}'
similar thread

find /path -type f -name "outfile.txt" | awk '
{
while((getline line<$0)>0){
if(line ~ /pattern/){
print $0":"line
}
}
}'

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

Removing millions of files - oneliner - perl

It can be done using find and grep combination: find BADnew -type f -exec grep -q sorted {} \; -exec rm {} \; Second -exec command will be executed only if return code for first one is zero. You can do dry run: find BADnew -type f -exec grep -q sorted {} \; -exec echo {} \;

the core module File::Find will recursively traverse all the subdirectories and perform a subroutine on all files found perl -MFile::Find -e 'find( sub { open $f,"<",$_; unlink if grep /sorted/, <$f> }, "BADnew")'

Related

Fish Shell: Delete All Except

find command for the newest 500 files in a directory tree and also be POSIX compliant

unix find and replace text in dir and subdirs

Current filename for find/xargs/perl -ne?

How can I traverse a directory tree using a bash or Perl script?

Categories

Resources