How to rename a zero-padded file sequence efficiently in ZSH?

How to rename a zero-padded file sequence efficiently in ZSH? - find

I have a picture sequence named with zero-padded numbers like so:
/path/to/file_07469.jpx
/path/to/file_07470.jpx
/path/to/file_07471.jpx
/path/to/file_07472.jpx
/path/to/file_07473.jpx
/path/to/file_07474.jpx
/path/to/file_07475.jpx
/path/to/file_07476.jpx
/path/to/file_07477.jpx
/path/to/file_07478.jpx
/path/to/file_07479.jpx
/path/to/file_07480.jpx
/path/to/file_07481.jpx
/path/to/file_07482.jpx
This is just an extract. It is thousands of files. I’d like to rename all files from a certain number on, adding / subtracting X. I’d love to use find with a regex.
#!/bin/zsh
shift=-1000
seqnumstart="$(echo "$1" | grep -Eo "\d+")"
bn="$(basename $1)"
bbn="$(echo "${bn%_*}")"
ext="$(echo "${bn##*.}")"
find "$(dirname $1)" -name "$bbn*$ext" -print0 | while read -d $'\0' file
do
seqnum="$(echo "$file" | grep -Eo "\d+")"
seqnum="$(echo "${seqnum#"${seqnum%%[!0]*}"}")"
if [[ "$seqnum" -ge "$seqnumstart" ]]; then
seqnumnew=$(($seqnum + $shift))
seqnumnew=$(printf %05d $seqnumnew)
filenew="$(echo $file | sed -E 's [0-9]+ '$seqnumnew' g')"
mv "$file" "$filenew"
fi
done
How can I improve my code? It is very slow. Im on a Mac (zsh).

zmv is a utility in zsh that can do a lot of filename manipulation and looping for you. Try this:
zmv -n 'p/file_(<7000-7999>).jpx' 'p/file_$(printf "%05d" $(($1 - 1000))).jpx'
Some of the pieces:
zmv: an autoload function; use autoload -Uz zmv to make it available (this is usually added to .zshrc).
-n: no-op. With this option, zmv will just print what would have happened, giving you an idea if the command is correct. Remove this to actually mv the files.
(...): grouping operator for zmv. This identifies sections in the name that you want to change; this section is referenced in the 'to' argument as $1.
<7000-7999>: glob operator for a range. Note that leading zeroes are not always required.
$(printf "%05d" ...): zero-padding.
$((...)): arithmetic.
$1: reference to the parenthetical value in the 'from' argument'. This is where zmv's magic happens - this is substituted for each matching filename.
As you likely know, you'll need to do the renaming in groups or in a specific order to avoid trying to change a name to a name that already exists. zmv will usually halt when it encounters collisions like that.

This is much faster:
#!/bin/zsh
shift=1000
seqnumstart="$(echo "$1" | grep -Eo "\d+")"
lastfile="$(find "$(dirname $1)" -name "*.jpx" | sort | tail -1)"
seqnumend="$(echo "$lastfile" | grep -Eo "\d+")"
bn="$(basename $1)"
bbn="$(echo "${bn%_*}")"
#extension
ext="$(echo "${bn##*.}")"
#basepath before the padded number
bp="$(echo "${1%_*}")"
function buildpath {
echo "$bp"_"$1"."$ext"
}
for i in {$seqnumstart..$seqnumend}
do
unpad="$(echo $i | sed 's/^0*//')"
seqnumnew="$(($unpad + $shift))"
seqnumnewpad="$(printf %05d $seqnumnew)"
op="$(buildpath "$i")"
np="$(buildpath "$seqnumnewpad")"
mv "$op" "$np"
done

Related

How to rename all the files (without for loop) in a single line command?

I want to rename all the files in my home directory (example abc), in the format (abc_bkp) without using any loops and it should be a single line command in unix (bash script).

If the directory contains nothing but files, this should do it:
ls | xargs -I {} mv {} {}_bkp
If it contains subdirectories, links, and other things you don't want to rename, you must filter the output of ls. Here is a crude way to do it; maybe someone can suggest a more elegant approach:
ls -l | grep ^- | cut -d' ' -f 13 | xargs -I {} mv {} {}_bkp

If you don't want to use loops then I believe the BEST way could be find command, try following command as a DRY run first and once you are satisfy with results then you could remove echo from it to give a real shot.
find -type f -or -type d | xargs -I % echo mv % %_bkp
-I: From man xargs page:
-I replace-str
Replace occurrences of replace-str in the initial-arguments with names read from standard input. Also, unquoted blanks do not
terminate
input items; instead the separator is the newline character. Implies -x and -L 1.

Dynamically building a exlude list for both rsync & egrep format

I wonder if anyone out there can assist me in trying to solve a issue with me.
I have written a set of shell scripts with the purpose of auditing remote file systems based on a GOLD build on a audit server.
As part of this, I do the following:
1) Use rsync to work out any new files or directories, any modified or removed files
2) Use find ${source_filesystem} -ls on both local & remote to work out permissions differences
Now as part of this there are certain files or directories that I am excluding, i.e. logs, trace files etc.
So in order to achieve this I use 2 methods:
1) RSYNC - I have an exclude-list that is added using --exclude-from flag
2) find -ls - I use a egrep -v statement to exclude the same as the rsync exclude-list:
e.g. find -L ${source_filesystem} -ls | egrep -v "$SEXCLUDE_supt"
So my issue is that I have to maintain 2 separate lists and this is a bit of a admin nightmare.
I am looking for some assistance or some advice on if it is possible to dynamically build a list of exlusions that can be used for both the rsync or the find -ls?
Here is the format of what the exclude lists look like::
RSYNC:
*.log
*.out
*.csv
logs
shared
tracing
jdk*
8.6_Code
rpsupport
dbarchive
inarchive
comms
PR116PICL
**/lost+found*/
dlxwhsr*
regression
tmp
working
investigation
Investigation
dcsserver_weblogic_*.ear
dcswebrdtEAR_weblogic_*.ear
FIND:
SEXCLUDE_supt="\.log|\.out|\.csv|logs|shared|PR116PICL|tracing|lost\+found|jdk|8\.6\_Code|rpsupport|dbarchive|inarchive|comms|dlxwhsr|regression|tmp|working|investigation|Investigation|dcsserver_weblogic_|dcswebrdtEAR_weblogic_"

You don't need to create a second list for your find command. grep can handle a list of patterns using the -f flag. From the manual:
-f FILE, --file=FILE
Obtain patterns from FILE, one per line. The empty file contains zero
patterns, and therefore matches nothing. (-f is specified by POSIX.)
Here's what I'd do:
find -L ${source_filesystem} -ls | grep -Evf your_rsync_exclude_file_here
This should also work for filenames containing newlines and spaces. Please let me know how it goes.

In the end the grep -Evf was a bit of a nightmare as rsync didnt support regex, it uses regex but not the same.
So I then pursued my other idea of dynamically building the exclude list for egrep by parsing the rsync exclude-list and building variable on the fly to pass into egrep.
This the method I used:
#!/bin/ksh
# Create Signature of current build
AFS=$1
#Create Signature File
crSig()
{
find -L ${SRC} -ls | egrep -v **"$SEXCLUDE"** | awk '{fws = ""; for (i = 11; i <= NF; i++) fws = fws $i " "; print $3, $6, fws}' | sort >${BASE}/${SIFI}.${AFS}
}
#Setup SRC, TRG & SCROOT
LoadAuditReqs()
{
export SRC=`grep ${AFS} ${CONF}/fileSystem.properties | awk {'print $2'}`
export TRG=`grep ${AFS} ${CONF}/fileSystem.properties | awk {'print $3'}`
export SCROOT=`grep ${AFS} ${CONF}/fileSystem.properties | awk {'print $4'}`
**export BEXCLUDE=$(sed -e 's/[*/]//g' -e 's/\([._+-]\)/\\\1/g' ${CONF}/exclude-list.${AFS} | tr "\n" "|")**
**export SEXCLUDE=$(echo ${BEXCLUDE} | sed 's/\(.*\)|/\1/')**
}
#Load Properties File
LoadProperties()
{
. /users/rpapp/rpmonit/audit_tool/conf/environment.properties
}
#Functions
LoadProperties
LoadAuditReqs
crSig
So with these new variables:
**export BEXCLUDE=$(sed -e 's/[*/]//g' -e 's/\([._+-]\)/\\\1/g' ${CONF}/exclude-list.${AFS} | tr "\n" "|")**
**export SEXCLUDE=$(echo ${BEXCLUDE} | sed 's/\(.*\)|/\1/')**
I use them to remove "*" and "/", then match my special characters and prepend with "\" to escape them.
Then it using "tr" replace a newline with "|" and then rerunning that output to remove the trailing "|" to make the variable $SEXCLUDE to use for egrep that is used in the crSig function.
What do you think?

perl -pe to manipulate filenames

I was trying to do some quick filename cleanup at the shell (zsh, if it matters). Renaming files. (I'm using cp instead of mv just to be safe)
foreach f (\#*.ogg)
cp $f `echo $f | perl -pe 's/\#\d+ (.+)$/"\1"/'`
end
Now, I know there are tools to do stuff like this, but for personal interest I'm wondering how I can do it this way. Right now, I get an error:
cp: target `When.ogg"' is not a directory
Where 'When.ogg' is the last part of the filename. I've tried adding quotes (see above) and escaping the spaces, but nonetheless this is what I get.
Is there a reason I can't use the output of s perl pmr=;omrt as the final argument to another command line tool?

It looks like you have a space in the file names being processed, so each of your cp command lines evaluates to something like
cp \#nnnn When.Ogg When.ogg
When the cp command sees more than two arguments, the last one must be a target directory name for all the files to be copied to - hence the error message. Because your source filename ($f) contains a space it is being treated as two arguments - cp sees three args, rather than the two you intend.
If you put double quotes around the first $f that should prevent the two 'halves' of the name from being treated as separate file names:
cp "$f" `echo ...

This is what you need in bash, hope it's good for zsh too.
cp "$f" "`echo $f | perl -pe 's/\#\d+ (.+)$/\1/'`"
If the filename contains spaces, you also have quote the second argument of cp.

I often use
dir /b ... | perl -nle"$o=$_; s/.../.../; $n=$_; rename $o,$n if !-e $n"
The -l chomps the input.
The -e check is to avoid accidentally renaming all the files to one name. I've done that a couple of times.
In bash (and I'm guessing zsh), that would be
foreach f (...)
echo "$f" | perl -nle'$o=$_; s/.../.../; $n=$_; rename $o,$n if !-e $n'
end
or
find -name '...' -maxdepth 1 \
| perl -nle'$o=$_; s/.../.../; $n=$_; rename $o,$n if !-e $n'
or
find -name '...' -maxdepth 1 -exec \
perl -e'for (#ARGV) {
$o=$_; s/.../.../; $n=$_;
rename $o,$n if !-e $n;
}' {} +
The last supports file names with newlines in them.

Unable to filter rows which contain "Is a directory" by SED/AWK

I run the code gives me the following sample data
md5deep find * | awk '{ print $1 }'
A sample of the output
/Users/math/Documents/Articles/Number theory: Is a directory
258fe6853b1bfb2d07f512ff6bec52b1
/Users/math/Documents/Articles/Probability and statistics: Is a directory
4811bfb2ad04b9f4318049c01ebb52ef
8aae4ac3694658cf90005dbdea37b4d5
258fe6853b1bfb2d07f512ff6bec52b1
I have tried to filter the rows which contain Is a directory by SED unsuccessfully
md5deep find * | awk '{ print $1 }' | sed s/\/*//g
Its sample output is
/Users/math/Documents/Articles/Number theory: Is a directory
/Users/math/Documents/Articles/Topology: Is a directory
/Users/math/Documents/Articles/useful: Is a directory
How can I filter Out each row which contains "Is a directory" by SED/AWK?
[clarification]
I want to filter out the rows which contain Is a directory.

I have not used the md5deep tool, but I believe those lines are error messages; they would be going to standard error instead of standard out, and so they are going directly to your terminal instead of through the pipe. Thus, they won't be filtered by your sed command. You could filter them by merging your standard error and standard output streams, but
It looks like (I'm not sure because you are missing the backquotes) you are trying to call
md5deep `find *`
and find is returning all of the files and directories.
Some notes on what you might want to do:
It looks like md5deep has a -r for "recursive" option. So, you may want to try:
md5deep -r *
instead of the find command.
If you do wish to use a find command, you can limit it to only files using -type f, instead of files and directories. Also, you don't need to pass * into a find command (which may confuse find if there are files that have names that looks like the options that find understands); passing in . will search recursively through the current directory.
find . -type f
In sed if you wish to use slashes in your pattern, it can be a pain to quote them correctly with \. You can instead choose a different character to delimit your regular expression; sed will use the first character after the s command as a delimiter. Your pattern is also lacking a .; in regular expressions, to indicate one instance of any character you use ., and to indicate "zero or more of the preceding expression" you use *, so .* indicates "zero or more of any character" (this is different from glob patterns, in which * alone means "zero or more of any character").
sed "s|/.*||g"
If you really do want to be including your standard error stream in your standard output, so it will pass through the pipe, then you can run:
md5deep `find *` 2>&1 | awk ...
If you just want to ignore stderr, you can redirect that to /dev/null, which is a special file that just discards anything that goes into it:
md5deep `find *` 2>/dev/null | awk ...
In summary, I think the command below will help you with your immediate problem, and the other suggestions listed above may help you if I did not undersand what you were looking for:
md5deep -r * | awk '{ print $1 }'

To specifically answer the clarification: how to filter out lines using awk and sed:
awk '/Is a directory/ {next} {print}'
sed 'g/Is a directory/d'

Why not use grep instead?
ie,
md5deep find * | grep "Is a directory" | awk '{ print $1 }'
Edit: I just re-read your question and if you want to remove the lines with Is a directory, use the -v flag of grep, ie:
md5deep find * | grep -v "Is a directory" | awk '{ print $1 }'

I'm not intimately familiar with md5deep, but this may do something like you are tying to do.
find -type f -exec md5sum {} +

How can I show lines in common (reverse diff)?

I have a series of text files for which I'd like to know the lines in common rather than the lines which are different between them. Command line Unix or Windows is fine.
File foo:
linux-vdso.so.1 => (0x00007fffccffe000)
libvlc.so.2 => /usr/lib/libvlc.so.2 (0x00007f0dc4b0b000)
libvlccore.so.0 => /usr/lib/libvlccore.so.0 (0x00007f0dc483f000)
libc.so.6 => /lib/libc.so.6 (0x00007f0dc44cd000)
File bar:
libkdeui.so.5 => /usr/lib/libkdeui.so.5 (0x00007f716ae22000)
libkio.so.5 => /usr/lib/libkio.so.5 (0x00007f716a96d000)
linux-vdso.so.1 => (0x00007fffccffe000)
So, given these two files above, the output of the desired utility would be akin to file1:line_number, file2:line_number == matching text (just a suggestion; I really don't care what the syntax is):
foo:1, bar:3 == linux-vdso.so.1 => (0x00007fffccffe000)

On *nix, you can use comm. The answer to the question is:
comm -1 -2 file1.sorted file2.sorted
# where file1 and file2 are sorted and piped into *.sorted
Here's the full usage of comm:
comm [-1] [-2] [-3 ] file1 file2
-1 Suppress the output column of lines unique to file1.
-2 Suppress the output column of lines unique to file2.
-3 Suppress the output column of lines duplicated in file1 and file2.
Also note that it is important to sort the files before using comm, as mentioned in the man pages.

I found this answer on a question listed as a duplicate. I find grep to be more administrator-friendly than comm, so if you just want the set of matching lines (useful for comparing CSV files, for instance) simply use
grep -F -x -f file1 file2
Or the simplified fgrep version:
fgrep -xf file1 file2
Plus, you can use file2* to glob and look for lines in common with multiple files, rather than just two.
Some other handy variations include
-n flag to show the line number of each matched line
-c to only count the number of lines that match
-v to display only the lines in file2 that differ (or use diff).
Using comm is faster, but that speed comes at the expense of having to sort your files first. It isn't very useful as a 'reverse diff'.

It was asked here before: Unix command to find lines common in two files
You could also try with Perl (credit goes here):
perl -ne 'print if ($seen{$_} .= #ARGV) =~ /10$/' file1 file2

I just learned the comm command from the answers, but I wanted to add something extra: if the files are not sorted, and you don't want to touch the original files, you can pipe the output of the sort command. This leaves the original files intact. It works in Bash, but I can't say about other shells.
comm -1 -2 <(sort file1) <(sort file2)
This can be extended to compare command output, instead of files:
comm -1 -2 <(ls /dir1 | sort) <(ls /dir2 | sort)

The easiest way to do it is:
awk 'NR==FNR{a[$1]++;next} a[$1] ' file1 file2
Files are not necessary to be sorted.

I think diff utility itself, using its unified (-U) option, can be used to achieve effect. Because the first column of output of diff marks whether the line is an addition, or deletion, we can look for lines that haven't changed.
diff -U1000 file_1 file_2 | grep '^ '
The number 1000 is chosen arbitrarily, big enough to be larger than any single hunk of diff output.
Here's the full, foolproof set of commands:
f1="file_1"
f2="file_2"
lc1=$(wc -l "$f1" | cut -f1 -d' ')
lc2=$(wc -l "$f2" | cut -f1 -d' ')
lcmax=$(( lc1 > lc2 ? lc1 : lc2 ))
diff -U$lcmax "$f1" "$f2" | grep '^ ' | less
# Alternatively, use this grep to ignore the lines starting
# with +, -, and # signs.
# grep -vE '^[+#-]'
If you want to include the lines that are just moved around, you can sort the input before diffing, like so:
f1="file_1"
f2="file_2"
lc1=$(wc -l "$f1" | cut -f1 -d' ')
lc2=$(wc -l "$f2" | cut -f1 -d' ')
lcmax=$(( lc1 > lc2 ? lc1 : lc2 ))
diff -U$lcmax <(sort "$f1") <(sort "$f2") | grep '^ ' | less

In Windows, you can use a PowerShell script with CompareObject:
compare-object -IncludeEqual -ExcludeDifferent -PassThru (get-content A.txt) (get-content B.txt)> MATCHING.txt | Out-Null #Find Matching Lines
CompareObject:
IncludeEqual without -ExcludeDifferent: Everything
ExcludeDifferent without -IncludeEqual: Nothing

Just for information, I made a little tool for Windows doing the same thing as "grep -F -x -f file1 file2" (As I haven't found anything equivalent to this command on Windows)
Here it is:
http://www.nerdzcore.com/?page=commonlines
Usage is "CommonLines inputFile1 inputFile2 outputFile"
Source code is also available (GPL).

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse