Frequently Modified Files in Clearcase - version-control

I am very new to Clearcase and one of the task that I have got on my hand is to find frequently modified files in ClearCase, Suppose we have an integration stream and there are numerous files in our stream, need to know of certain files which are modified frequently like a certain file is modified 5 times in last two months.
I have access to ClearCase commands as well as GUI
Is there a way we can have solution to this problem.
Thanks

You can do, following find examples, a search between two dates:
cleartool find . -version "{created_since(date1) &&
!created_since(date2) &&
brtype(myIntStream)" -exec "cleartool descr -fmt "%En"\
|sort| uniq -c | sort -n
(This is the Windows syntax, which means you need GoW (Gnu On Windows) installed for the v and uniq commands.
As Brian Cowan adds in the comments, the command would be:
cleartool find -all -version "{created_since(date1) &&
!created_since(date2) &&
brtype(myIntStream)" -exec "cleartool desc -fmt \"%En\n\" \"%CLEARCASE_XPN\"" \
|sort| uniq -c | sort -n
On Unix:
cleartool find -all -version "{created_since(date1) &&
!created_since(date2) &&
brtype(myIntStream)" -exec 'cleartool desc -fmt "%En\n" "$CLEARCASE_XPN"' \
|sort| uniq -c | sort -n
-all instead of the current directory format, to avoid issues if the command isn't run at the VOB root.
If you don't care about the interval, but only want the last 2 months, drop the !created_since line.
Alternatively, use "today" as the second date, though that would everything modified since midnight your local on the day you run the command.

Related

Is there any alternative to the "-mtime" time in Linux?

I would like to delete all files in a specific directory according to the creation (or modification) date. I know that to accomplish this task there is the command:
find /tmp/log -maxdepth 1 -name 'file*' -mmin +60 -type f -exec rm {} \;
The problem is that I'm using an OpenWrt system versione that doesn't support the flag -mtime (either -mmin). So is there any alternative to delete files according to the date?
OpenWrt seems to replace most of the Linux commands with BusyBox versions (ouch) (see https://openwrt.org/docs/techref/busybox)
According to BusyBox documentation, the current version does support -mtime, but seemingly your version does not? (I'd try it again, to make sure you didn't goof up the command line)
Anyway, how to get this done using BusyBox commands?
Install a new version of BusyBox?
or Install the "real" version of the find command
or if this command date -d '#946710000' prints a date ~ 1/1/2000, try the following (not tested on BusyBox !!):
cur=`date "+%s"` # current timestamp
old=`echo $cur|awk '{print $1 - (3600*10)}'` # 10 days ago
for f in *;do
date -r "$f" "+%s^$f" # timestamp of named file
done |
awk -F'^' -v "old=$old" '$1<old{print $2}' # older?
(assumes no ^ in filenames)

clearcase create diff between parent and child branches

So the typical way I would create a diff log/patch between two branches in clearcase would to simply create two views and do a typical unix diff. But I have to assume that there is a more clearcase way (and also a '1-liner').
so knowing how to get a list of all files that have been modified on a branch:
cleartool find . -type f -branch "brtype(<BRANCH_NAME>)" -print
and knowing how to get the diff formatted output for two separate files:
cleartool diff FILE FILE##/main/PARENT_BRANCH_PATH/LATEST
so does anyone see any issues with the following to get a diff for all files that have been changed in a branch?
cleartool find . -type f -branch "brtype(CHILD_BRANCH)" -exec 'cleartool diff -ser $CLEARCASE_PN `echo $CLEARCASE_XPN | sed "s/CHILD_BRANCH/LATEST/"` ' > diff.log
Any modifications and comments are greatly welcomed
thanks in advance!
update: any ideas on how to get this too be a unix unified diff would also be greatly appreciated.
update2: So I think I have my solution, thanks go to VonC for sending me in the right directions:
cleartool find . -type f -branch "brtype(CHILD_BRANCH)" -exec 'cleartool get -to $CLEARCASE_PN.prev `echo $CLEARCASE_XPN | sed "s/CHILD_BRANCH/LATEST/"`; diff -u $CLEARCASE_PN.prev $CLEARCASE_PN; rm -f $CLEARCASE_PN.prev' > CHILD_BRANCH.diff
the output seems to work, I can read the file in via kompare without complaints.
The idea is sound.
I would simply make sure the $CLEARCASE_PN and $CLEARCASE_XPN are used with double quotes around them, to take into account with potential spaces in the file path or file name (as illustrated in "How do I list ClearCase versions without the Fully-qualified version?").
cleartool find . -type f -branch "brtype(CHILD_BRANCH)" -exec 'cleartool diff -ser "$CLEARCASE_PN" `echo "$CLEARCASE_XPN" | sed "s/CHILD_BRANCH/LATEST/"` ' > diff.log
Using simple quotes for the -exec directive is a good idea, as explained in "CLEARCASE_XPN not parsed as variable in clearcase command".
However, cleartool diff, even with the -ser (-serial) option don't produce exactly an Unix unified diff format (or Unified Format for short).
The -diff(_format) option is the closest, as I mention in "How would you measure inserted / changed / removed code lines (LoC)?"
The -diff_format option causes both the headers and differences to be reported in the style of the UNIX and Linux diff utility, writing a list of the changes necessary to convert the first file being compared into the second file.
One idea would be to not use cleartool diff, but use directly diff, since it can access in a dynamic view the right version through the extended pathname of the elements found.
The OP ckcin's solution is close that what I suggested with cleartool get:
cleartool find . -type f -branch "brtype(CHILD_BRANCH)" -exec 'cleartool get -to $CLEARCASE_PN.prev `echo $CLEARCASE_XPN | sed "s/CHILD_BRANCH/LATEST/"`; diff -u $CLEARCASE_PN.prev $CLEARCASE_PN; rm -f $CLEARCASE_PN.prev' > CHILD_BRANCH.diff
the output seems to work, I can read the file in via kompare without complaints.
In multiple line, for readability:
cleartool find . -type f -branch "brtype(CHILD_BRANCH)"
-exec 'cleartool get -to $CLEARCASE_PN.prev
`echo $CLEARCASE_XPN | sed "s/CHILD_BRANCH/LATEST/"`;
diff -u $CLEARCASE_PN.prev $CLEARCASE_PN;
rm -f $CLEARCASE_PN.prev' > CHILD_BRANCH.diff
(Note that $CLEARCASE_XPN and $CLEARCASE_PN are set by the cleartool find commant, they're not variables you set yourself.)
Transferring the answer from VonC and einpoklum to Windows I came up with the following. Create a separate batch file, which I called diffClearCase.bat, this eases up the command line significantly. It creates a separate tree for all modified files, which I personally liked, but the file and folders can be deleted afterwards.
#echo off
SET PLAINFILE=%1
SET PLAINDIR=%~dp1
SET CLEARCASE_FILE=%2
SET BRANCH_NAME=%3
SET SOURCE_DRIVE=T:
SET TARGET_TEMP_DIR=D:
SET DIFF_TARGET_FILE=D:\allPatch.diff
call set BASE_FILE=%%CLEARCASE_FILE:%BRANCH_NAME%=LATEST%%
call set TARGET_FILE=%%PLAINFILE:%SOURCE_DRIVE%=%TARGET_TEMP_DIR%%%
call set TARGET_DIR=%%PLAINDIR:%SOURCE_DRIVE%=%TARGET_TEMP_DIR%%%
echo Diffing file %PLAINFILE%
IF NOT EXIST %TARGET_DIR% mkdir %TARGET_DIR%
cleartool get -to %TARGET_FILE% %BASE_FILE%
diff -u %TARGET_FILE% %PLAINFILE% >> %DIFF_TARGET_FILE%
rem del /F/Q %TARGET_FILE%
And then I created a second bat file which simply takes the branch name as argument. In our case this directory contains multiple VOBs, so I iterate over them and do this per VOB.
#echo off
SET BRANCHNAME=%1
SET DIFF_TARGET_FILE=D:\allPatch.diff
SET SOURCE_DRIVE=T:
SET DIFF_TOOL=D:\Data\Scripts\diffClearCase.bat
IF EXIST %DIFF_TARGET_FILE% DEL /Q %DIFF_TARGET_FILE%
for /D %%V in ("%SOURCE_DRIVE%\*") DO (
echo Checking VOB %%V
cd %%V
cleartool find %%V -type f -branch "brtype(%BRANCHNAME%)" -exec "%DIFF_TOOL% \"%%CLEARCASE_PN%%\" \"%%CLEARCASE_XPN%%\" %BRANCHNAME%"
)

finding most recent file version from list of file path names with jumbled file names

I recently lost a bunch of files from eclipse in an accidental copy/replace dilema. I was able to recover most of them but I found in the eclipse metadata folder a history of files, some of which are the ones I need. The path for the history is:
($WORKSPACE/.metadata/.plugins/org.eclipse.core.resources/.history).
Inside there are a bunch of folders like 3e,2f,1a,ff, etc.. each with a couple files named like "2054f7f9a0d30012175be7013ca49f5b". I was able to do a recursive grep with a keyword i know would be in the file and return a list of file names (grep -R -l 'KEYWORD') and now I can't figure out how to sort them by most recently modified.
any help would be great, thanks!
you can try:
find $WORK.../.history -type f -printf '%T#\t%p\n' | sort -nr | cut -f2- | xargs grep 'your_pattern'
Decomposed:
the find finds all plain files and prints their modification time and path
the sort sort sort them numerically - and reverse, so highest number comes first (the latest modified)
the cut removes the time from each line
the xargs run its argument for each file what get to it input,
in this case will run the grep command, so
the 1st file what the grep find - was the lastest modified
The above not works when the filenames containing spaces, but hopefully this is not your case... The -printf works only with GNU find.
For the repetative work, you can split the command to two parts:
find $WORK.../.history -type f -printf '%T#\t%p\n' | sort -nr | cut -f2- > /somewhere/FILENAMES_SORTED_BY_MODIF_TIME
so in 1st step you save to somewhere the list of filenames sorted by their modification times, and after you can repeatedly use the grep command on their content with:
< /somewhere/FILENAMES_SORTED_BY_MODIF_TIME xargs grep 'your_pattern'
the above command is usually written as
xargs grep 'your_pattern' < /somewhere/FILENAMES_SORTED_BY_MODIF_TIME
but for the bash is OK write the redirection to the start and in this case is simpler changing the pattern for the grep if the pattern is in the last place...
If you want check the list of filenames with modification times, you can break the above commands as:
find $WORK.../.history -type f -printf "%T#\t%Tc\t%p\n" | sort -nr >/somewehre/FILENAMES_WITH_DATE
check the list (they now contains readable date too) and use the next
< /somewehre/FILENAMES_WITH_DATE cut -f3- | xargs grep 'your_pattern'
note, now need to use -f3- and not -f2- as in the 1st example.

How to use multiple files at once using bash

I have a perl script which is used to process some data files from a given directory. I have written below bash script to look for the last updated file in the given directory and process that file.
cd $data_dir
find \( -type f -mtime -1 \) -exec ./script.pl {} \;
Sometimes, user copied multiple files to the data dir and hence the previous one skipped. The perl script execute only the last updated file. Can you please suggest me how to fix this using bash script.
Try
cd $data_dir
find \( -type f -mtime -1 \) -exec ./script.pl {} +
Note the termination of -exec with a + vs your \;
From the man page
-exec command {} +
This variant of the -exec action runs the specified command on the selected files, but the command line is built by appending each selected file name at the end;
Now that you'll have one or more file names passed into your perl script, you can alter your perl script to iterate over each passed in file name.
If I understood the question correctly, you need to process any files that were created or modified in a directory since the last time your script was run.
In my opinion find is not the right tool to determine those files, because it has no notion of which files it has already seen.
Using any of the -atime/-ctime/-mtime options will either produce duplicates if you run your script twice in the specified period, or miss some files if it is not executed at the right time. The timing intricacies of using these options for something like this are not easy to deal with.
I can propose a few alternatives:
a) Use three directories instead of one: incoming/ processing/ done/. Your users should only be allowed to put files in incoming/. You move any files in there to processing/ with a simple mv incoming/* processing/ before running your perl script. Then you move them from processing/ to done/ when its over.
In my opinion this is the simplest and best solution, and the one used by mail servers etc when dealing with this issue. If I were you and there were not any special circumstances preventing you from doing this, I'd stop reading here.
b) Have your finder script touch a special file (e.g. .timestamp, perhaps in a different directory, so that your users will not tamper with it) when it's done. This will allow your script to remember the last time it was run. Then use
find \( -cnewer .timestamp -o -newer .timestamp \) -type f -exec ./script.pl '{}' ';'
to run your perl script for each file. You should modify your perl script so that it can run repeatedly with a different file name each time. If you can modify it to accept multiple files in one go, you can also run it with
find \( -cnewer .timestamp -o -newer .timestamp \) -type f -exec ./script.pl '{}' +
which will minimise the number of ./script.pl processes. Take care to handle the first run of the find script, when the .timestamp file is missing. A good solution would be to simply ignore it by not using the -*newer options at all in that case. Also keep in mind that there is a race condition where files added after find was started but before touching the timestamp file will not be processed.
c) As a variation of (b), have your script update the timestamp with the time of the processed file that was created/modified most recently. This is tricky, because find cannot order its output on its own. You could use a wrapper around your perl script to handle this:
#!/bin/bash
for i in "$#"; do
find "$i" \( -cnewer .timestamp -o -newer .timestamp \) -exec touch -r '{}' .timestamp ';'
done
./script.pl "$#"
This will update the timestamp if it is called to process a file with a newer mtime or ctime, minimising (but not eliminating) the race condition. It is however somewhat awkward - unavoidable since bash's [[ -nt option seems to only check the mtime. It might be better if your perl script handled that on its own.
d) Have your script store each processed filename and its timestamps somewhere and then skip duplicates. That would allow you to just pass all files in the directory to it and let it sort out the mess. Kinda tricky though...
e) Since your are using Linux, you might want to have a look at inotify and the inotify-tools package - specifically the inotifywait tool. With a bit of scripting it would allow you to process files as they are added in the directory:
inotifywait -e MOVED_TO -e CLOSE_WRITE -m -r testd/ | grep --line-buffered -e MOVED_TO -e CLOSE_WRITE | while read d e f; do ./script.pl "$f"; done
This has no race conditions, as long as your users do not create/copy/move any directories rather than just files.
The perl script will only execute against the file which find gives it. Perhaps you should remove the -mtime -1 option from the find command so that it picks up all the files in the directory?

Finding most commonly edited files in clearcase

We are currently planning a quality improvement exercise and i would like to target the most commonly edited files in our clearcase vobs. Since we have just been through a bug fixing phase the most commonly edited files should give a good indication of where the most bug prone code is, and therefore the most in need of quality improvment.
Does anyone know if there is a way of obtaining a top 100 list of most edited files? Preferably this would cover edits that are happening on multiple branches.
(The previous answer was for a simpler case: single branch)
Since "most projects dev has not all happened on the one branch so the version numbers don't necessarily mean most edited", a "way to get number of check-ins across all branches" would be:
search all versions created since the date of the last bug fixing phase,
sort them by file,
then by occurrence.
Something along the lines of:
C:\Prog\cc\test\test>ct find -all -type f -ver "created_since(16-Oct-2009)" -exec "cleartool descr -fmt """%En~%Sn\n""""""%CLEARCASE_XPN%"""" | grep -v "\\0" | awk -F ~ "{print $1}" | sort | uniq -c | sort /R | head -100
Or, for Unix syntax:
$ ct find -all -type f -ver 'created_since(16-Oct-2009)' -exec 'cleartool descr -fmt "%En~%Sn\n" "%CLEARCASE_XPN%"' | grep -v "/0" | awk -F ~ '{print $1}' | sort | uniq -c | sort -rn | head -100
replace the date by the one of the label marking the start of your bug-fixing phase
Again, note the double-quotes around the '%CLEARCASE_XPN%' to accommodate spaces within file names.
Here, '%CLEARCASE_XPN%' is used rather than '%CLEARCASE_PN%' because we need every versions.
grep -v "/0" is here to exclude version 0 (/main/0, /main/myBranch/0, ...)
awk -F ~ "{print $1}" is used to only print the first part of each line:
C:\Prog\cc\test\test\a.txt~\main\mybranch\2 becomes C:\Prog\cc\test\test\a.txt
From there, the counting and sorting can begin:
sort to make sure every identical line is grouped
uniq -c to remove duplicate lines and precede each remaining line with a count of said duplicates
sort -rn (or sort /R for Windows) for having the most edited files at the top
head -100 for keeping only the 100 most edited files.
Again, GnuWin32 will come in handy for the Windows version of the one-liner.
(See answer for more complicated case: multiple branches)
First, use a dynamic view: easier and quicker to update its content and fiddle with its config spec rules.
If your bug-fixing has been made in a branch, starting from a given label, set-up a dynamic view with the following config spec as:
element * .../MY_BRANCH/LATEST
element * MY_STARTING_LABEL
element * /main/LATEST
Then you find all files, with their current version number (closely related to the number of edits)
ct find . -type f -exec "cleartool desc -fmt """%Ln\t\t%En\n""" """%CLEARCASE_PN%""""|sort /R|head -100
This is the Windows syntax (nothe the triple "double-quotes" around %CLEARCASE_PN% in order to accommodate spaces within the file names.
the 'head' command comes from the GnuWin32 library.
The most edited version are at the top of the list.
A Unix version would be:
$ ct find . -type f -exec 'cleartool desc -fmt "%Ln\t\t%En\n" "$CLEARCASE_PN"' | sort -rn | head -100
The most edited version would be at the top.
Do not forget that for metrics, the raw numbers are not enough, trends are important too.