Finding most commonly edited files in clearcase - version-control

We are currently planning a quality improvement exercise and i would like to target the most commonly edited files in our clearcase vobs. Since we have just been through a bug fixing phase the most commonly edited files should give a good indication of where the most bug prone code is, and therefore the most in need of quality improvment.
Does anyone know if there is a way of obtaining a top 100 list of most edited files? Preferably this would cover edits that are happening on multiple branches.

(The previous answer was for a simpler case: single branch)
Since "most projects dev has not all happened on the one branch so the version numbers don't necessarily mean most edited", a "way to get number of check-ins across all branches" would be:
search all versions created since the date of the last bug fixing phase,
sort them by file,
then by occurrence.
Something along the lines of:
C:\Prog\cc\test\test>ct find -all -type f -ver "created_since(16-Oct-2009)" -exec "cleartool descr -fmt """%En~%Sn\n""""""%CLEARCASE_XPN%"""" | grep -v "\\0" | awk -F ~ "{print $1}" | sort | uniq -c | sort /R | head -100
Or, for Unix syntax:
$ ct find -all -type f -ver 'created_since(16-Oct-2009)' -exec 'cleartool descr -fmt "%En~%Sn\n" "%CLEARCASE_XPN%"' | grep -v "/0" | awk -F ~ '{print $1}' | sort | uniq -c | sort -rn | head -100
replace the date by the one of the label marking the start of your bug-fixing phase
Again, note the double-quotes around the '%CLEARCASE_XPN%' to accommodate spaces within file names.
Here, '%CLEARCASE_XPN%' is used rather than '%CLEARCASE_PN%' because we need every versions.
grep -v "/0" is here to exclude version 0 (/main/0, /main/myBranch/0, ...)
awk -F ~ "{print $1}" is used to only print the first part of each line:
C:\Prog\cc\test\test\a.txt~\main\mybranch\2 becomes C:\Prog\cc\test\test\a.txt
From there, the counting and sorting can begin:
sort to make sure every identical line is grouped
uniq -c to remove duplicate lines and precede each remaining line with a count of said duplicates
sort -rn (or sort /R for Windows) for having the most edited files at the top
head -100 for keeping only the 100 most edited files.
Again, GnuWin32 will come in handy for the Windows version of the one-liner.

(See answer for more complicated case: multiple branches)
First, use a dynamic view: easier and quicker to update its content and fiddle with its config spec rules.
If your bug-fixing has been made in a branch, starting from a given label, set-up a dynamic view with the following config spec as:
element * .../MY_BRANCH/LATEST
element * MY_STARTING_LABEL
element * /main/LATEST
Then you find all files, with their current version number (closely related to the number of edits)
ct find . -type f -exec "cleartool desc -fmt """%Ln\t\t%En\n""" """%CLEARCASE_PN%""""|sort /R|head -100
This is the Windows syntax (nothe the triple "double-quotes" around %CLEARCASE_PN% in order to accommodate spaces within the file names.
the 'head' command comes from the GnuWin32 library.
The most edited version are at the top of the list.
A Unix version would be:
$ ct find . -type f -exec 'cleartool desc -fmt "%Ln\t\t%En\n" "$CLEARCASE_PN"' | sort -rn | head -100
The most edited version would be at the top.
Do not forget that for metrics, the raw numbers are not enough, trends are important too.

Related

Frequently Modified Files in Clearcase

I am very new to Clearcase and one of the task that I have got on my hand is to find frequently modified files in ClearCase, Suppose we have an integration stream and there are numerous files in our stream, need to know of certain files which are modified frequently like a certain file is modified 5 times in last two months.
I have access to ClearCase commands as well as GUI
Is there a way we can have solution to this problem.
Thanks
You can do, following find examples, a search between two dates:
cleartool find . -version "{created_since(date1) &&
!created_since(date2) &&
brtype(myIntStream)" -exec "cleartool descr -fmt "%En"\
|sort| uniq -c | sort -n
(This is the Windows syntax, which means you need GoW (Gnu On Windows) installed for the v and uniq commands.
As Brian Cowan adds in the comments, the command would be:
cleartool find -all -version "{created_since(date1) &&
!created_since(date2) &&
brtype(myIntStream)" -exec "cleartool desc -fmt \"%En\n\" \"%CLEARCASE_XPN\"" \
|sort| uniq -c | sort -n
On Unix:
cleartool find -all -version "{created_since(date1) &&
!created_since(date2) &&
brtype(myIntStream)" -exec 'cleartool desc -fmt "%En\n" "$CLEARCASE_XPN"' \
|sort| uniq -c | sort -n
-all instead of the current directory format, to avoid issues if the command isn't run at the VOB root.
If you don't care about the interval, but only want the last 2 months, drop the !created_since line.
Alternatively, use "today" as the second date, though that would everything modified since midnight your local on the day you run the command.

Renaming files/dir's from one date format to another

So I am a coding newbie and have, for some time, wanted to edit the formatting of my fairly extensive live music library. I have looked around on here and various other resources to get to where I am, but I have hit a snag. I have directories named in the following ways:
02.10.90 | 23 East Caberet - Ardmore, PA
02.16.90 | The Paradise - Boston, MA
and I would like to rename these simply to
1990-02-10 | 23 East Caberet - Ardmore, PA
1990-02-16 | The Paradise - Boston, MA
I have been able to rename the date correctly using:
ls -1 | grep 90 | awk '{print $1}' | awk -F. '{printf "%s-%s-%s\n", "19"$3,$1,$2}' > list1.txt
and then pull the rest of the name using
ls -1 | grep 90 | awk '{first = $1; $1 = ""; print $0}'>list2.txt
So, I have a list of directories ranging from years 1990-2004 that I would like to apply this to (they are all in different sub directories so I don't mind manually changing the "grep 90". However, from the two separate lists that I generate, I cant figure out how to make it loop through each row and print "mv original_name list1.txt+list2.txt" so that it would read:
mv 02.10.90 | 23 East Caberet - Ardmore, PA 1990-02-10 | 23 East Caberet - Ardmore, PA
I scanned through many previous posts and couldn't quite figure out the last bit - or better yet, a more elegant solution! Any help is greatly appreciated, thank you in advance!
Don't parse the output from ls, google why, and you don't need grep when you're using awk, nor do you need chains of awk commands but for this task you wouldn't use any of those commands anyway.
The UNIX command to find files is named find so start with that. This will find all directories with names starting with the given globbing pattern:
find . -type d -name '[0-9][0-9].[0-9][0-9].90 *'
Now that you've found the files you need to do something with them. For your needs IMHO the simplest approach is best and that'd be:
find . -type d -name '[0-9][0-9].[0-9][0-9].90 *' -print0 |
while IFS= read -r -d '' old; do
path="$(dirname "$old")"
oldDirName="$(basename "$old")"
if [[ oldDirName =~ ([0-9]+)\.([0-9]+)\.([0-9]+)( .*) ]]; then
newDirName="19${BASH_REMATCH[3]}-${BASH_REMATCH[1]}-${BASH_REMATCH[2]}${BASH_REMATCH[4]}
echo mv -- "${path}/${oldDirName}" "${path}/${newDirName}"
fi
done
The above is using GNU find for -print0 and bash for BASH_REMATCH. Remove the echo when you've debugged it if necessary and are happy with what it's going to do.

Using xargs arguments twice

I need to check if local file is same as remote host file.
The file locations are like below:
File1 at Local machine
./remotehostname/home/a/b/scripts/xyz.cpp
File2 at remote machine
remotehostname:/home/a/b/scripts/xyz.cpp
I intend to compare these 2 files, using the command
diff ./remotehostname/home/a/b/scripts/xyz.cpp remotehostname:/home/a/b/scripts/xyz.cpp
find . -type f | grep -v .svn |xargs -I % diff %
I need to change % to take remotehost and compare the file.
Not sure how to apply sed on %. Or is there a better way to compare such files.
One way could be to save the list of files and then apply sed on that file, but I think there should be an even better way. Also the diff doesnt work on remote hosts, maybe I need to use output of dry rsync?
This can be done with xargs, but I prefer to use while read in bash.
xargs method
find . -type f | grep -v .svn | sed 's/.*/& remotehostname:&/' | xargs -n2 diff
The sed command duplicates the input and makes whatever modifications you need. The xargs then passes the inputs to diff two at a time. This will not work if any filename contain spaces.
bash method
find . -type f | grep -v .svn | while read line; do
diff "$line" "remotehostname:$line"
done
The bash read command reads a line from stdin, places it in the name variable, $line, and returns true. You can then put whatever you like inside the loop, so you get total freedom to rewrite the filename however you need. When the input runs out, read returns false, and the loop exits.
Note that piping things into loops has some interesting side effects that are not relevant here, but might bite you one day.
If you are interested in the actual difference (and not just whether they differ - which rsync is brilliant for telling you) then you can do this using GNU Parallel:
find . -type f | grep -v .svn |
parallel diff {} '<(ssh {= s:./::;s:/.*:: =} cat {= s:([^/]+/){2,2}::;$_=::shell_quote_scalar($_) =})'
s:./::;s:/.*:: = hostname from path
s:([^/]+/){2,2}:: = rest of path
::shell_quote_scalar = \-quote special chars as needed by the shell
GNU Parallel is a general parallelizer and makes is easy to run jobs in parallel on the same machine or on multiple machines you have ssh access to. It can often replace a for loop.
If you have 32 different jobs you want to run on 4 CPUs, a straight forward way to parallelize is to run 8 jobs on each CPU:
GNU Parallel instead spawns a new process when one finishes - keeping the CPUs active and thus saving time:
Installation
If GNU Parallel is not packaged for your distribution, you can do a personal installation, which does not require root access. It can be done in 10 seconds by doing this:
(wget -O - pi.dk/3 || curl pi.dk/3/ || fetch -o - http://pi.dk/3) | bash
For other installation options see http://git.savannah.gnu.org/cgit/parallel.git/tree/README
Learn more
See more examples: http://www.gnu.org/software/parallel/man.html
Watch the intro videos: https://www.youtube.com/playlist?list=PL284C9FF2488BC6D1
Walk through the tutorial: http://www.gnu.org/software/parallel/parallel_tutorial.html
Sign up for the email list to get support: https://lists.gnu.org/mailman/listinfo/parallel

clearcase create diff between parent and child branches

So the typical way I would create a diff log/patch between two branches in clearcase would to simply create two views and do a typical unix diff. But I have to assume that there is a more clearcase way (and also a '1-liner').
so knowing how to get a list of all files that have been modified on a branch:
cleartool find . -type f -branch "brtype(<BRANCH_NAME>)" -print
and knowing how to get the diff formatted output for two separate files:
cleartool diff FILE FILE##/main/PARENT_BRANCH_PATH/LATEST
so does anyone see any issues with the following to get a diff for all files that have been changed in a branch?
cleartool find . -type f -branch "brtype(CHILD_BRANCH)" -exec 'cleartool diff -ser $CLEARCASE_PN `echo $CLEARCASE_XPN | sed "s/CHILD_BRANCH/LATEST/"` ' > diff.log
Any modifications and comments are greatly welcomed
thanks in advance!
update: any ideas on how to get this too be a unix unified diff would also be greatly appreciated.
update2: So I think I have my solution, thanks go to VonC for sending me in the right directions:
cleartool find . -type f -branch "brtype(CHILD_BRANCH)" -exec 'cleartool get -to $CLEARCASE_PN.prev `echo $CLEARCASE_XPN | sed "s/CHILD_BRANCH/LATEST/"`; diff -u $CLEARCASE_PN.prev $CLEARCASE_PN; rm -f $CLEARCASE_PN.prev' > CHILD_BRANCH.diff
the output seems to work, I can read the file in via kompare without complaints.
The idea is sound.
I would simply make sure the $CLEARCASE_PN and $CLEARCASE_XPN are used with double quotes around them, to take into account with potential spaces in the file path or file name (as illustrated in "How do I list ClearCase versions without the Fully-qualified version?").
cleartool find . -type f -branch "brtype(CHILD_BRANCH)" -exec 'cleartool diff -ser "$CLEARCASE_PN" `echo "$CLEARCASE_XPN" | sed "s/CHILD_BRANCH/LATEST/"` ' > diff.log
Using simple quotes for the -exec directive is a good idea, as explained in "CLEARCASE_XPN not parsed as variable in clearcase command".
However, cleartool diff, even with the -ser (-serial) option don't produce exactly an Unix unified diff format (or Unified Format for short).
The -diff(_format) option is the closest, as I mention in "How would you measure inserted / changed / removed code lines (LoC)?"
The -diff_format option causes both the headers and differences to be reported in the style of the UNIX and Linux diff utility, writing a list of the changes necessary to convert the first file being compared into the second file.
One idea would be to not use cleartool diff, but use directly diff, since it can access in a dynamic view the right version through the extended pathname of the elements found.
The OP ckcin's solution is close that what I suggested with cleartool get:
cleartool find . -type f -branch "brtype(CHILD_BRANCH)" -exec 'cleartool get -to $CLEARCASE_PN.prev `echo $CLEARCASE_XPN | sed "s/CHILD_BRANCH/LATEST/"`; diff -u $CLEARCASE_PN.prev $CLEARCASE_PN; rm -f $CLEARCASE_PN.prev' > CHILD_BRANCH.diff
the output seems to work, I can read the file in via kompare without complaints.
In multiple line, for readability:
cleartool find . -type f -branch "brtype(CHILD_BRANCH)"
-exec 'cleartool get -to $CLEARCASE_PN.prev
`echo $CLEARCASE_XPN | sed "s/CHILD_BRANCH/LATEST/"`;
diff -u $CLEARCASE_PN.prev $CLEARCASE_PN;
rm -f $CLEARCASE_PN.prev' > CHILD_BRANCH.diff
(Note that $CLEARCASE_XPN and $CLEARCASE_PN are set by the cleartool find commant, they're not variables you set yourself.)
Transferring the answer from VonC and einpoklum to Windows I came up with the following. Create a separate batch file, which I called diffClearCase.bat, this eases up the command line significantly. It creates a separate tree for all modified files, which I personally liked, but the file and folders can be deleted afterwards.
#echo off
SET PLAINFILE=%1
SET PLAINDIR=%~dp1
SET CLEARCASE_FILE=%2
SET BRANCH_NAME=%3
SET SOURCE_DRIVE=T:
SET TARGET_TEMP_DIR=D:
SET DIFF_TARGET_FILE=D:\allPatch.diff
call set BASE_FILE=%%CLEARCASE_FILE:%BRANCH_NAME%=LATEST%%
call set TARGET_FILE=%%PLAINFILE:%SOURCE_DRIVE%=%TARGET_TEMP_DIR%%%
call set TARGET_DIR=%%PLAINDIR:%SOURCE_DRIVE%=%TARGET_TEMP_DIR%%%
echo Diffing file %PLAINFILE%
IF NOT EXIST %TARGET_DIR% mkdir %TARGET_DIR%
cleartool get -to %TARGET_FILE% %BASE_FILE%
diff -u %TARGET_FILE% %PLAINFILE% >> %DIFF_TARGET_FILE%
rem del /F/Q %TARGET_FILE%
And then I created a second bat file which simply takes the branch name as argument. In our case this directory contains multiple VOBs, so I iterate over them and do this per VOB.
#echo off
SET BRANCHNAME=%1
SET DIFF_TARGET_FILE=D:\allPatch.diff
SET SOURCE_DRIVE=T:
SET DIFF_TOOL=D:\Data\Scripts\diffClearCase.bat
IF EXIST %DIFF_TARGET_FILE% DEL /Q %DIFF_TARGET_FILE%
for /D %%V in ("%SOURCE_DRIVE%\*") DO (
echo Checking VOB %%V
cd %%V
cleartool find %%V -type f -branch "brtype(%BRANCHNAME%)" -exec "%DIFF_TOOL% \"%%CLEARCASE_PN%%\" \"%%CLEARCASE_XPN%%\" %BRANCHNAME%"
)

finding most recent file version from list of file path names with jumbled file names

I recently lost a bunch of files from eclipse in an accidental copy/replace dilema. I was able to recover most of them but I found in the eclipse metadata folder a history of files, some of which are the ones I need. The path for the history is:
($WORKSPACE/.metadata/.plugins/org.eclipse.core.resources/.history).
Inside there are a bunch of folders like 3e,2f,1a,ff, etc.. each with a couple files named like "2054f7f9a0d30012175be7013ca49f5b". I was able to do a recursive grep with a keyword i know would be in the file and return a list of file names (grep -R -l 'KEYWORD') and now I can't figure out how to sort them by most recently modified.
any help would be great, thanks!
you can try:
find $WORK.../.history -type f -printf '%T#\t%p\n' | sort -nr | cut -f2- | xargs grep 'your_pattern'
Decomposed:
the find finds all plain files and prints their modification time and path
the sort sort sort them numerically - and reverse, so highest number comes first (the latest modified)
the cut removes the time from each line
the xargs run its argument for each file what get to it input,
in this case will run the grep command, so
the 1st file what the grep find - was the lastest modified
The above not works when the filenames containing spaces, but hopefully this is not your case... The -printf works only with GNU find.
For the repetative work, you can split the command to two parts:
find $WORK.../.history -type f -printf '%T#\t%p\n' | sort -nr | cut -f2- > /somewhere/FILENAMES_SORTED_BY_MODIF_TIME
so in 1st step you save to somewhere the list of filenames sorted by their modification times, and after you can repeatedly use the grep command on their content with:
< /somewhere/FILENAMES_SORTED_BY_MODIF_TIME xargs grep 'your_pattern'
the above command is usually written as
xargs grep 'your_pattern' < /somewhere/FILENAMES_SORTED_BY_MODIF_TIME
but for the bash is OK write the redirection to the start and in this case is simpler changing the pattern for the grep if the pattern is in the last place...
If you want check the list of filenames with modification times, you can break the above commands as:
find $WORK.../.history -type f -printf "%T#\t%Tc\t%p\n" | sort -nr >/somewehre/FILENAMES_WITH_DATE
check the list (they now contains readable date too) and use the next
< /somewehre/FILENAMES_WITH_DATE cut -f3- | xargs grep 'your_pattern'
note, now need to use -f3- and not -f2- as in the 1st example.