finding most recent file version from list of file path names with jumbled file names - eclipse

I recently lost a bunch of files from eclipse in an accidental copy/replace dilema. I was able to recover most of them but I found in the eclipse metadata folder a history of files, some of which are the ones I need. The path for the history is:
($WORKSPACE/.metadata/.plugins/org.eclipse.core.resources/.history).
Inside there are a bunch of folders like 3e,2f,1a,ff, etc.. each with a couple files named like "2054f7f9a0d30012175be7013ca49f5b". I was able to do a recursive grep with a keyword i know would be in the file and return a list of file names (grep -R -l 'KEYWORD') and now I can't figure out how to sort them by most recently modified.
any help would be great, thanks!

you can try:
find $WORK.../.history -type f -printf '%T#\t%p\n' | sort -nr | cut -f2- | xargs grep 'your_pattern'
Decomposed:
the find finds all plain files and prints their modification time and path
the sort sort sort them numerically - and reverse, so highest number comes first (the latest modified)
the cut removes the time from each line
the xargs run its argument for each file what get to it input,
in this case will run the grep command, so
the 1st file what the grep find - was the lastest modified
The above not works when the filenames containing spaces, but hopefully this is not your case... The -printf works only with GNU find.
For the repetative work, you can split the command to two parts:
find $WORK.../.history -type f -printf '%T#\t%p\n' | sort -nr | cut -f2- > /somewhere/FILENAMES_SORTED_BY_MODIF_TIME
so in 1st step you save to somewhere the list of filenames sorted by their modification times, and after you can repeatedly use the grep command on their content with:
< /somewhere/FILENAMES_SORTED_BY_MODIF_TIME xargs grep 'your_pattern'
the above command is usually written as
xargs grep 'your_pattern' < /somewhere/FILENAMES_SORTED_BY_MODIF_TIME
but for the bash is OK write the redirection to the start and in this case is simpler changing the pattern for the grep if the pattern is in the last place...
If you want check the list of filenames with modification times, you can break the above commands as:
find $WORK.../.history -type f -printf "%T#\t%Tc\t%p\n" | sort -nr >/somewehre/FILENAMES_WITH_DATE
check the list (they now contains readable date too) and use the next
< /somewehre/FILENAMES_WITH_DATE cut -f3- | xargs grep 'your_pattern'
note, now need to use -f3- and not -f2- as in the 1st example.

Related

How to rename all the files (without for loop) in a single line command?

I want to rename all the files in my home directory (example abc), in the format (abc_bkp) without using any loops and it should be a single line command in unix (bash script).
If the directory contains nothing but files, this should do it:
ls | xargs -I {} mv {} {}_bkp
If it contains subdirectories, links, and other things you don't want to rename, you must filter the output of ls. Here is a crude way to do it; maybe someone can suggest a more elegant approach:
ls -l | grep ^- | cut -d' ' -f 13 | xargs -I {} mv {} {}_bkp
If you don't want to use loops then I believe the BEST way could be find command, try following command as a DRY run first and once you are satisfy with results then you could remove echo from it to give a real shot.
find -type f -or -type d | xargs -I % echo mv % %_bkp
-I: From man xargs page:
-I replace-str
Replace occurrences of replace-str in the initial-arguments with names read from standard input. Also, unquoted blanks do not
terminate
input items; instead the separator is the newline character. Implies -x and -L 1.

sed with filename from pipe

In a folder I have many files with several parameters in filenames, e.g (just with one parameter) file_a1.0.txt, file_a1.2.txt etc.
These are generated by a c++ code and I'd need to take the last one (in time) generated. I don't know a priori what will be the value of this parameter when the code is terminated. After that I need to copy the 2nd line of this last file.
To copy the 2nd line of the any file, I know that this sed command works:
sed -n 2p filename
I know also how to find the last generated file:
ls -rtl file_a*.txt | tail -1
Question:
how to combine these two operation? Certainly it is possible to pipe the 2nd operation to that sed operation but I dont know how to include filename from pipe as input to that sed command.
You can use this,
ls -rt1 file_a*.txt | tail -1 | xargs sed -n '2p'
(OR)
sed -n '2p' `ls -rt1 file_a*.txt | tail -1`
sed -n '2p' $(ls -rt1 file_a*.txt | tail -1)
Typically you can put a command in back ticks to put its output at a particular point in another command - so
sed -n 2p `ls -rt name*.txt | tail -1 `
Alternatively - and preferred, because it is easier to nest etc -
sed -n 2p $(ls -rt name*.txt | tail -1)
-r in ls is reverse order.
-r, --reverse
reverse order while sorting
But it is not good idea when used it with tail -1.
With below change (head -1 without r option in ls), performance will be better, that you needn't wait to list all files then pipe to tail command
sed -n 2p $(ls -t1 name*.txt | head -1 )
I was looking for a similar solution: taking the file names from a pipe of grep results to feed to sed. I've copied my answer here for the search & replace, but perhaps this example can help as it calls sed for each of the names found in the pipe:
this command to simply find all the files:
grep -i -l -r foo ./*
this one to exclude this_shell.sh (in case you put the command in a script called this_shell.sh), tee the output to the console to see what happened, and then use sed on each file name found to replace the text foo with bar:
grep -i -l -r --exclude "this_shell.sh" foo ./* | tee /dev/fd/2 | while read -r x; do sed -b -i 's/foo/bar/gi' "$x"; done
I chose this method, as I didn't like having all the timestamps changed for files not modified. Feeding the grep result allows only the files with target text to be looked at (thus likely may improve performance / speed as well)
be sure to backup your files & test before using. May not work in some environments for files with embedded spaces. (?)
fwiw - I had some problems using the tail method, it seems that the entire dataset was generated before calling tail on just the last item.

How to search inside a list of files for multiple values exisiting on the same line?

I am trying to do a search in one directory containing a large number of html files, to find those files that contain the exact values on the same line. This should work:
grep -iwc 'word1' -sl | xargs grep -iwc 'word2' -s
But that only works on one file at a time. I tried something like this:
find . -iname '*html' | xargs grep -iwc 'word1' -sl | xargs grep -iwc 'word2' -s
But that seems to display files containing any of the two values, so even those that are not on the same line.
The output should only be the file names and the number of occurrences like:
file.html:2
If it possible to group those 2 greps? Or another way to do this search?
An extended regex may help. Something like this perhaps?
find . -iname '*html' | xargs egrep -iwl '(word1.*word2|word2.*word1)'
Since you only have two words you're looking for, it's not too hard to list off all the ways to order them.

Grep data and output to file

I'm attempting to extract data from log files and organise it systematically. I have about 9 log files which are ~100mb each in size.
What I'm trying to do is: Extract multiple chunks from each log file, and for each chunk extracted, I would like to create a new file and save this extracted data to it. Each chunk has a clear start and end point.
Basically, I have made some progress and am able to extract the data I need, however, I've hit a wall in trying to figure out how to create a new file for each matched chunk.
I'm unable to use a programming language like Python or Perl, due to the constraints of my environment. So please excuse the messy command.
My command thus far:
find Logs\ 13Sept/Log_00000000*.log -type f -exec \
sed -n '/LRE Starting chunk/,/LRE Ending chunk/p' {} \; | \
grep -v -A1 -B1 "Starting chunk" > Logs\ 13Sept/Chunks/test.txt
The LRE Starting chunk and LRE Ending chunk are my boundaries. Right now my command works, but it saves all matched chunks to one file (whose size is becoming exessive).
How do I go about creating a new file for each match and add the matched content to it? keeping in mind that each file could hold multiple chunks and is not limited to one chunk per file.
Probably need something more programmable than sed: I'm assuming awk is available.
awk '
/LRE Ending chunk/ {printing = 0}
printing {print > "chunk" n ".txt"}
/LRE Starting chunk/ {printing = 1; n++}
' *.log
Try something like this:
find Logs\ 13Sept/Log_00000000*.log -type f -print | while read file; do \
sed -n '/LRE Starting chunk/,/LRE Ending chunk/p' "$file" | \
grep -v -A1 -B1 "Starting chunk" > "Logs 13Sept/Chunks/$file.chunk.txt";
done
This loops over the find results and executes for each file and then create one $file.chunk.txt for each of the files.
Something like this perhaps?
find Logs\ 13Sept/Log_00000000*.log -type f -exec \
sed -n '/LRE Starting chunk/,/LRE Ending chunk/{;/LRE .*ing chunk/d;w\
'"{}.chunk"';}' {} \;
This uses sed's w command to write to a file named (inputfile).chunk. If that is not acceptable, perhaps you can use sh -c '...' to pass in a small shell script to wrap the sed command with. (Or is a shell script also prohibited for some reason?)
Perhaps you could use csplit to do the splitting, then truncate the output files at the chunk end.

Finding most commonly edited files in clearcase

We are currently planning a quality improvement exercise and i would like to target the most commonly edited files in our clearcase vobs. Since we have just been through a bug fixing phase the most commonly edited files should give a good indication of where the most bug prone code is, and therefore the most in need of quality improvment.
Does anyone know if there is a way of obtaining a top 100 list of most edited files? Preferably this would cover edits that are happening on multiple branches.
(The previous answer was for a simpler case: single branch)
Since "most projects dev has not all happened on the one branch so the version numbers don't necessarily mean most edited", a "way to get number of check-ins across all branches" would be:
search all versions created since the date of the last bug fixing phase,
sort them by file,
then by occurrence.
Something along the lines of:
C:\Prog\cc\test\test>ct find -all -type f -ver "created_since(16-Oct-2009)" -exec "cleartool descr -fmt """%En~%Sn\n""""""%CLEARCASE_XPN%"""" | grep -v "\\0" | awk -F ~ "{print $1}" | sort | uniq -c | sort /R | head -100
Or, for Unix syntax:
$ ct find -all -type f -ver 'created_since(16-Oct-2009)' -exec 'cleartool descr -fmt "%En~%Sn\n" "%CLEARCASE_XPN%"' | grep -v "/0" | awk -F ~ '{print $1}' | sort | uniq -c | sort -rn | head -100
replace the date by the one of the label marking the start of your bug-fixing phase
Again, note the double-quotes around the '%CLEARCASE_XPN%' to accommodate spaces within file names.
Here, '%CLEARCASE_XPN%' is used rather than '%CLEARCASE_PN%' because we need every versions.
grep -v "/0" is here to exclude version 0 (/main/0, /main/myBranch/0, ...)
awk -F ~ "{print $1}" is used to only print the first part of each line:
C:\Prog\cc\test\test\a.txt~\main\mybranch\2 becomes C:\Prog\cc\test\test\a.txt
From there, the counting and sorting can begin:
sort to make sure every identical line is grouped
uniq -c to remove duplicate lines and precede each remaining line with a count of said duplicates
sort -rn (or sort /R for Windows) for having the most edited files at the top
head -100 for keeping only the 100 most edited files.
Again, GnuWin32 will come in handy for the Windows version of the one-liner.
(See answer for more complicated case: multiple branches)
First, use a dynamic view: easier and quicker to update its content and fiddle with its config spec rules.
If your bug-fixing has been made in a branch, starting from a given label, set-up a dynamic view with the following config spec as:
element * .../MY_BRANCH/LATEST
element * MY_STARTING_LABEL
element * /main/LATEST
Then you find all files, with their current version number (closely related to the number of edits)
ct find . -type f -exec "cleartool desc -fmt """%Ln\t\t%En\n""" """%CLEARCASE_PN%""""|sort /R|head -100
This is the Windows syntax (nothe the triple "double-quotes" around %CLEARCASE_PN% in order to accommodate spaces within the file names.
the 'head' command comes from the GnuWin32 library.
The most edited version are at the top of the list.
A Unix version would be:
$ ct find . -type f -exec 'cleartool desc -fmt "%Ln\t\t%En\n" "$CLEARCASE_PN"' | sort -rn | head -100
The most edited version would be at the top.
Do not forget that for metrics, the raw numbers are not enough, trends are important too.