Renaming files/dir's from one date format to another - date

So I am a coding newbie and have, for some time, wanted to edit the formatting of my fairly extensive live music library. I have looked around on here and various other resources to get to where I am, but I have hit a snag. I have directories named in the following ways:
02.10.90 | 23 East Caberet - Ardmore, PA
02.16.90 | The Paradise - Boston, MA
and I would like to rename these simply to
1990-02-10 | 23 East Caberet - Ardmore, PA
1990-02-16 | The Paradise - Boston, MA
I have been able to rename the date correctly using:
ls -1 | grep 90 | awk '{print $1}' | awk -F. '{printf "%s-%s-%s\n", "19"$3,$1,$2}' > list1.txt
and then pull the rest of the name using
ls -1 | grep 90 | awk '{first = $1; $1 = ""; print $0}'>list2.txt
So, I have a list of directories ranging from years 1990-2004 that I would like to apply this to (they are all in different sub directories so I don't mind manually changing the "grep 90". However, from the two separate lists that I generate, I cant figure out how to make it loop through each row and print "mv original_name list1.txt+list2.txt" so that it would read:
mv 02.10.90 | 23 East Caberet - Ardmore, PA 1990-02-10 | 23 East Caberet - Ardmore, PA
I scanned through many previous posts and couldn't quite figure out the last bit - or better yet, a more elegant solution! Any help is greatly appreciated, thank you in advance!

Don't parse the output from ls, google why, and you don't need grep when you're using awk, nor do you need chains of awk commands but for this task you wouldn't use any of those commands anyway.
The UNIX command to find files is named find so start with that. This will find all directories with names starting with the given globbing pattern:
find . -type d -name '[0-9][0-9].[0-9][0-9].90 *'
Now that you've found the files you need to do something with them. For your needs IMHO the simplest approach is best and that'd be:
find . -type d -name '[0-9][0-9].[0-9][0-9].90 *' -print0 |
while IFS= read -r -d '' old; do
path="$(dirname "$old")"
oldDirName="$(basename "$old")"
if [[ oldDirName =~ ([0-9]+)\.([0-9]+)\.([0-9]+)( .*) ]]; then
newDirName="19${BASH_REMATCH[3]}-${BASH_REMATCH[1]}-${BASH_REMATCH[2]}${BASH_REMATCH[4]}
echo mv -- "${path}/${oldDirName}" "${path}/${newDirName}"
fi
done
The above is using GNU find for -print0 and bash for BASH_REMATCH. Remove the echo when you've debugged it if necessary and are happy with what it's going to do.

Related

How to rename a zero-padded file sequence efficiently in ZSH?

I have a picture sequence named with zero-padded numbers like so:
/path/to/file_07469.jpx
/path/to/file_07470.jpx
/path/to/file_07471.jpx
/path/to/file_07472.jpx
/path/to/file_07473.jpx
/path/to/file_07474.jpx
/path/to/file_07475.jpx
/path/to/file_07476.jpx
/path/to/file_07477.jpx
/path/to/file_07478.jpx
/path/to/file_07479.jpx
/path/to/file_07480.jpx
/path/to/file_07481.jpx
/path/to/file_07482.jpx
This is just an extract. It is thousands of files. I’d like to rename all files from a certain number on, adding / subtracting X. I’d love to use find with a regex.
#!/bin/zsh
shift=-1000
seqnumstart="$(echo "$1" | grep -Eo "\d+")"
bn="$(basename $1)"
bbn="$(echo "${bn%_*}")"
ext="$(echo "${bn##*.}")"
find "$(dirname $1)" -name "$bbn*$ext" -print0 | while read -d $'\0' file
do
seqnum="$(echo "$file" | grep -Eo "\d+")"
seqnum="$(echo "${seqnum#"${seqnum%%[!0]*}"}")"
if [[ "$seqnum" -ge "$seqnumstart" ]]; then
seqnumnew=$(($seqnum + $shift))
seqnumnew=$(printf %05d $seqnumnew)
filenew="$(echo $file | sed -E 's [0-9]+ '$seqnumnew' g')"
mv "$file" "$filenew"
fi
done
How can I improve my code? It is very slow. Im on a Mac (zsh).
zmv is a utility in zsh that can do a lot of filename manipulation and looping for you. Try this:
zmv -n 'p/file_(<7000-7999>).jpx' 'p/file_$(printf "%05d" $(($1 - 1000))).jpx'
Some of the pieces:
zmv: an autoload function; use autoload -Uz zmv to make it available (this is usually added to .zshrc).
-n: no-op. With this option, zmv will just print what would have happened, giving you an idea if the command is correct. Remove this to actually mv the files.
(...): grouping operator for zmv. This identifies sections in the name that you want to change; this section is referenced in the 'to' argument as $1.
<7000-7999>: glob operator for a range. Note that leading zeroes are not always required.
$(printf "%05d" ...): zero-padding.
$((...)): arithmetic.
$1: reference to the parenthetical value in the 'from' argument'. This is where zmv's magic happens - this is substituted for each matching filename.
As you likely know, you'll need to do the renaming in groups or in a specific order to avoid trying to change a name to a name that already exists. zmv will usually halt when it encounters collisions like that.
This is much faster:
#!/bin/zsh
shift=1000
seqnumstart="$(echo "$1" | grep -Eo "\d+")"
lastfile="$(find "$(dirname $1)" -name "*.jpx" | sort | tail -1)"
seqnumend="$(echo "$lastfile" | grep -Eo "\d+")"
bn="$(basename $1)"
bbn="$(echo "${bn%_*}")"
#extension
ext="$(echo "${bn##*.}")"
#basepath before the padded number
bp="$(echo "${1%_*}")"
function buildpath {
echo "$bp"_"$1"."$ext"
}
for i in {$seqnumstart..$seqnumend}
do
unpad="$(echo $i | sed 's/^0*//')"
seqnumnew="$(($unpad + $shift))"
seqnumnewpad="$(printf %05d $seqnumnew)"
op="$(buildpath "$i")"
np="$(buildpath "$seqnumnewpad")"
mv "$op" "$np"
done

finding most recent file version from list of file path names with jumbled file names

I recently lost a bunch of files from eclipse in an accidental copy/replace dilema. I was able to recover most of them but I found in the eclipse metadata folder a history of files, some of which are the ones I need. The path for the history is:
($WORKSPACE/.metadata/.plugins/org.eclipse.core.resources/.history).
Inside there are a bunch of folders like 3e,2f,1a,ff, etc.. each with a couple files named like "2054f7f9a0d30012175be7013ca49f5b". I was able to do a recursive grep with a keyword i know would be in the file and return a list of file names (grep -R -l 'KEYWORD') and now I can't figure out how to sort them by most recently modified.
any help would be great, thanks!
you can try:
find $WORK.../.history -type f -printf '%T#\t%p\n' | sort -nr | cut -f2- | xargs grep 'your_pattern'
Decomposed:
the find finds all plain files and prints their modification time and path
the sort sort sort them numerically - and reverse, so highest number comes first (the latest modified)
the cut removes the time from each line
the xargs run its argument for each file what get to it input,
in this case will run the grep command, so
the 1st file what the grep find - was the lastest modified
The above not works when the filenames containing spaces, but hopefully this is not your case... The -printf works only with GNU find.
For the repetative work, you can split the command to two parts:
find $WORK.../.history -type f -printf '%T#\t%p\n' | sort -nr | cut -f2- > /somewhere/FILENAMES_SORTED_BY_MODIF_TIME
so in 1st step you save to somewhere the list of filenames sorted by their modification times, and after you can repeatedly use the grep command on their content with:
< /somewhere/FILENAMES_SORTED_BY_MODIF_TIME xargs grep 'your_pattern'
the above command is usually written as
xargs grep 'your_pattern' < /somewhere/FILENAMES_SORTED_BY_MODIF_TIME
but for the bash is OK write the redirection to the start and in this case is simpler changing the pattern for the grep if the pattern is in the last place...
If you want check the list of filenames with modification times, you can break the above commands as:
find $WORK.../.history -type f -printf "%T#\t%Tc\t%p\n" | sort -nr >/somewehre/FILENAMES_WITH_DATE
check the list (they now contains readable date too) and use the next
< /somewehre/FILENAMES_WITH_DATE cut -f3- | xargs grep 'your_pattern'
note, now need to use -f3- and not -f2- as in the 1st example.

Dynamically building a exlude list for both rsync & egrep format

I wonder if anyone out there can assist me in trying to solve a issue with me.
I have written a set of shell scripts with the purpose of auditing remote file systems based on a GOLD build on a audit server.
As part of this, I do the following:
1) Use rsync to work out any new files or directories, any modified or removed files
2) Use find ${source_filesystem} -ls on both local & remote to work out permissions differences
Now as part of this there are certain files or directories that I am excluding, i.e. logs, trace files etc.
So in order to achieve this I use 2 methods:
1) RSYNC - I have an exclude-list that is added using --exclude-from flag
2) find -ls - I use a egrep -v statement to exclude the same as the rsync exclude-list:
e.g. find -L ${source_filesystem} -ls | egrep -v "$SEXCLUDE_supt"
So my issue is that I have to maintain 2 separate lists and this is a bit of a admin nightmare.
I am looking for some assistance or some advice on if it is possible to dynamically build a list of exlusions that can be used for both the rsync or the find -ls?
Here is the format of what the exclude lists look like::
RSYNC:
*.log
*.out
*.csv
logs
shared
tracing
jdk*
8.6_Code
rpsupport
dbarchive
inarchive
comms
PR116PICL
**/lost+found*/
dlxwhsr*
regression
tmp
working
investigation
Investigation
dcsserver_weblogic_*.ear
dcswebrdtEAR_weblogic_*.ear
FIND:
SEXCLUDE_supt="\.log|\.out|\.csv|logs|shared|PR116PICL|tracing|lost\+found|jdk|8\.6\_Code|rpsupport|dbarchive|inarchive|comms|dlxwhsr|regression|tmp|working|investigation|Investigation|dcsserver_weblogic_|dcswebrdtEAR_weblogic_"
You don't need to create a second list for your find command. grep can handle a list of patterns using the -f flag. From the manual:
-f FILE, --file=FILE
Obtain patterns from FILE, one per line. The empty file contains zero
patterns, and therefore matches nothing. (-f is specified by POSIX.)
Here's what I'd do:
find -L ${source_filesystem} -ls | grep -Evf your_rsync_exclude_file_here
This should also work for filenames containing newlines and spaces. Please let me know how it goes.
In the end the grep -Evf was a bit of a nightmare as rsync didnt support regex, it uses regex but not the same.
So I then pursued my other idea of dynamically building the exclude list for egrep by parsing the rsync exclude-list and building variable on the fly to pass into egrep.
This the method I used:
#!/bin/ksh
# Create Signature of current build
AFS=$1
#Create Signature File
crSig()
{
find -L ${SRC} -ls | egrep -v **"$SEXCLUDE"** | awk '{fws = ""; for (i = 11; i <= NF; i++) fws = fws $i " "; print $3, $6, fws}' | sort >${BASE}/${SIFI}.${AFS}
}
#Setup SRC, TRG & SCROOT
LoadAuditReqs()
{
export SRC=`grep ${AFS} ${CONF}/fileSystem.properties | awk {'print $2'}`
export TRG=`grep ${AFS} ${CONF}/fileSystem.properties | awk {'print $3'}`
export SCROOT=`grep ${AFS} ${CONF}/fileSystem.properties | awk {'print $4'}`
**export BEXCLUDE=$(sed -e 's/[*/]//g' -e 's/\([._+-]\)/\\\1/g' ${CONF}/exclude-list.${AFS} | tr "\n" "|")**
**export SEXCLUDE=$(echo ${BEXCLUDE} | sed 's/\(.*\)|/\1/')**
}
#Load Properties File
LoadProperties()
{
. /users/rpapp/rpmonit/audit_tool/conf/environment.properties
}
#Functions
LoadProperties
LoadAuditReqs
crSig
So with these new variables:
**export BEXCLUDE=$(sed -e 's/[*/]//g' -e 's/\([._+-]\)/\\\1/g' ${CONF}/exclude-list.${AFS} | tr "\n" "|")**
**export SEXCLUDE=$(echo ${BEXCLUDE} | sed 's/\(.*\)|/\1/')**
I use them to remove "*" and "/", then match my special characters and prepend with "\" to escape them.
Then it using "tr" replace a newline with "|" and then rerunning that output to remove the trailing "|" to make the variable $SEXCLUDE to use for egrep that is used in the crSig function.
What do you think?

Finding most commonly edited files in clearcase

We are currently planning a quality improvement exercise and i would like to target the most commonly edited files in our clearcase vobs. Since we have just been through a bug fixing phase the most commonly edited files should give a good indication of where the most bug prone code is, and therefore the most in need of quality improvment.
Does anyone know if there is a way of obtaining a top 100 list of most edited files? Preferably this would cover edits that are happening on multiple branches.
(The previous answer was for a simpler case: single branch)
Since "most projects dev has not all happened on the one branch so the version numbers don't necessarily mean most edited", a "way to get number of check-ins across all branches" would be:
search all versions created since the date of the last bug fixing phase,
sort them by file,
then by occurrence.
Something along the lines of:
C:\Prog\cc\test\test>ct find -all -type f -ver "created_since(16-Oct-2009)" -exec "cleartool descr -fmt """%En~%Sn\n""""""%CLEARCASE_XPN%"""" | grep -v "\\0" | awk -F ~ "{print $1}" | sort | uniq -c | sort /R | head -100
Or, for Unix syntax:
$ ct find -all -type f -ver 'created_since(16-Oct-2009)' -exec 'cleartool descr -fmt "%En~%Sn\n" "%CLEARCASE_XPN%"' | grep -v "/0" | awk -F ~ '{print $1}' | sort | uniq -c | sort -rn | head -100
replace the date by the one of the label marking the start of your bug-fixing phase
Again, note the double-quotes around the '%CLEARCASE_XPN%' to accommodate spaces within file names.
Here, '%CLEARCASE_XPN%' is used rather than '%CLEARCASE_PN%' because we need every versions.
grep -v "/0" is here to exclude version 0 (/main/0, /main/myBranch/0, ...)
awk -F ~ "{print $1}" is used to only print the first part of each line:
C:\Prog\cc\test\test\a.txt~\main\mybranch\2 becomes C:\Prog\cc\test\test\a.txt
From there, the counting and sorting can begin:
sort to make sure every identical line is grouped
uniq -c to remove duplicate lines and precede each remaining line with a count of said duplicates
sort -rn (or sort /R for Windows) for having the most edited files at the top
head -100 for keeping only the 100 most edited files.
Again, GnuWin32 will come in handy for the Windows version of the one-liner.
(See answer for more complicated case: multiple branches)
First, use a dynamic view: easier and quicker to update its content and fiddle with its config spec rules.
If your bug-fixing has been made in a branch, starting from a given label, set-up a dynamic view with the following config spec as:
element * .../MY_BRANCH/LATEST
element * MY_STARTING_LABEL
element * /main/LATEST
Then you find all files, with their current version number (closely related to the number of edits)
ct find . -type f -exec "cleartool desc -fmt """%Ln\t\t%En\n""" """%CLEARCASE_PN%""""|sort /R|head -100
This is the Windows syntax (nothe the triple "double-quotes" around %CLEARCASE_PN% in order to accommodate spaces within the file names.
the 'head' command comes from the GnuWin32 library.
The most edited version are at the top of the list.
A Unix version would be:
$ ct find . -type f -exec 'cleartool desc -fmt "%Ln\t\t%En\n" "$CLEARCASE_PN"' | sort -rn | head -100
The most edited version would be at the top.
Do not forget that for metrics, the raw numbers are not enough, trends are important too.

Map sd?/sdd? names to Solaris disk names?

Some commands in Solaris (such as iostat) report disk related information using disk names such as sd0 or sdd2. Is there a consistent way to map these names back to the standard /dev/dsk/c?t?d?s? disk names in Solaris?
Edit: As Amit points out, iostat -n produces device names such as eg c0t0d0s0 instead of sd0. But how do I found out that sd0 actually is c0t0d0s0? I'm looking for something that produces a list like this:
sd0=/dev/dsk/c0t0d0s0
...
sdd2=/dev/dsk/c1t0d0s4
...
Maybe I could run iostat twice (with and without -n) and then join up the results and hope that the number of lines and device sorting produced by iostat is identical between the two runs?
Following Amit's idea to answer my own question, this is what I have come up with:
iostat -x|tail -n +3|awk '{print $1}'>/tmp/f0.txt.$$
iostat -nx|tail -n +3|awk '{print "/dev/dsk/"$11}'>/tmp/f1.txt.$$
paste -d= /tmp/f[01].txt.$$
rm /tmp/f[01].txt.$$
Running this on a Solaris 10 server gives the following output:
sd0=/dev/dsk/c0t0d0
sd1=/dev/dsk/c0t1d0
sd4=/dev/dsk/c0t4d0
sd6=/dev/dsk/c0t6d0
sd15=/dev/dsk/c1t0d0
sd16=/dev/dsk/c1t1d0
sd21=/dev/dsk/c1t6d0
ssd0=/dev/dsk/c2t1d0
ssd1=/dev/dsk/c3t5d0
ssd3=/dev/dsk/c3t6d0
ssd4=/dev/dsk/c3t22d0
ssd5=/dev/dsk/c3t20d0
ssd7=/dev/dsk/c3t21d0
ssd8=/dev/dsk/c3t2d0
ssd18=/dev/dsk/c3t3d0
ssd19=/dev/dsk/c3t4d0
ssd28=/dev/dsk/c3t0d0
ssd29=/dev/dsk/c3t18d0
ssd30=/dev/dsk/c3t17d0
ssd32=/dev/dsk/c3t16d0
ssd33=/dev/dsk/c3t19d0
ssd34=/dev/dsk/c3t1d0
The solution is not very elegant (it's not a one-liner), but it seems to work.
One liner version of the accepted answer (I only have 1 reputation so I can't post a comment):
paste -d= <(iostat -x | awk '{print $1}') <(iostat -xn | awk '{print $NF}') | tail -n +3
Try using the '-n' switch. For eg. 'iostat -n'
As pointed out in other answers, you can map the device name back to the instance name via the device path and information contained in /etc/path_to_inst. Here is a Perl script that will accomplish the task:
#!/usr/bin/env perl
use strict;
my #path_to_inst = qx#cat /etc/path_to_inst#;
map {s/"//g} #path_to_inst;
my ($device, $path, #instances);
for my $line (qx#ls -l /dev/dsk/*s2#) {
($device, $path) = (split(/\s+/, $line))[-3, -1];
$path =~ s#.*/devices(.*):c#$1#;
#instances =
map {join("", (split /\s+/)[-1, -2])}
grep {/$path/} #path_to_inst;
*emphasized text*
for my $instance (#instances) {
print "$device $instance\n";
}
}
I found the following in the Solaris Transistion Guide:
"Instance Names
Instance names refer to the nth device in the system (for example, sd20).
Instance names are occasionally reported in driver error messages. You can determine the binding of an instance name to a physical name by looking at dmesg(1M) output, as in the following example.
sd9 at esp2: target 1 lun 1
sd9 is /sbus#1,f8000000/esp#0,800000/sd#1,0
<SUN0424 cyl 1151 alt 2 hd 9 sec 80>
Once the instance name has been assigned to a device, it remains bound to that device.
Instance numbers are encoded in a device's minor number. To keep instance numbers consistent across reboots, the system records them in the /etc/path_to_inst file. This file is read only at boot time, and is currently updated by the add_drv(1M) and drvconf"
So based upon that, I wrote the following script:
for device in /dev/dsk/*s2
do
dpath="$(ls -l $device | nawk '{print $11}')"
dpath="${dpath#*devices/}"
dpath="${dpath%:*}"
iname="$(nawk -v dpath=$dpath '{
if ($0 ~ dpath) {
gsub("\"", "", $3)
print $3 $2
}
}' /etc/path_to_inst)"
echo "$(basename ${device}) = ${iname}"
done
By reading the information directly out of the path_to_inst file, we are allowing for adding and deleting devices, which will skew the instance numbers if you simply count the instances in the /devices directory tree.
I think simplest way to find descriptive name having instance name is:
# iostat -xn sd0
extended device statistics
r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device
4.9 0.2 312.1 1.9 0.0 0.0 3.3 3.5 0 1 c1t1d0
#
The last column shows descriptive name for provided instance name.
sd0 sdd0 are instance names of devices.. you can check /etc/path_to_inst to get instance name mapping to physical device name, then check link in /dev/dsk (to which physical device it is pointing) it is 100% sure method, though i dont know how to code it ;)
I found this snippet on the internet some time ago, and it does the trick. This was on Solaris 8:
#!/bin/sh
cd /dev/rdsk
/usr/bin/ls -l *s0 | tee /tmp/d1c |awk '{print "/usr/bin/ls -l "$11}' | \
sh | awk '{print "sd" substr($0,38,4)/8}' >/tmp/d1d
awk '{print substr($9,1,6)}' /tmp/d1c |paste - /tmp/d1d
rm /tmp/d1[cd]
A slight variation to allow for disk names that are longer than 8 characters (encountered when dealing with disk arrays on a SAN)
#!/bin/sh
cd /dev/rdsk
/usr/bin/ls -l *s0 | tee /tmp/d1c | awk '{print "/usr/bin/ls -l "$11}' | \
sh | awk '{print "sd" substr($0,38,4)/8}' >/tmp/d1d
awk '{print substr($9,1,index($9,"s0)-1)}' /tmp/d1c | paste - /tmp/d1d
rm /tmp/d1[cd]