Shell Script While Loop and Perl Printing Extra Line - perl

I modified a script which a poster gave me from another board to better suit my needs. InputConfig.txt contains directories to find files in, the inbound file age (second column) and the outbound file age (third column). These inbound/outbound numbers for each directory don't have to be the same, I just made them so. Most important is VI and AB directories have specific age to check against, everything else uses the generic 30 minutes.
Perl statement purpose is to capture the timestamp of each file found. The problem is the printf is putting an extra line because the while loop is reading 3 lines but I only need the 2 lines (or however many) to print.
I don't know Perl well enough to fix it - if the problem is with Perl.
Appreciate the help.
InputConfig.txt
/home/MF/NA/CD 30 30
/home/MF/NA/CD/VI 10 10
/home/MF/NA/CD/AB 15 15
Script
#!/bin/ksh
VI=*/CD/VI/*
AB=*/CD/AB/*
cat InputConfig.txt | while read DIR IT OT; do
TS=$(find "${DIR}" -type f -path "${DIR}/*/inbound/*" -mmin "+${IT}" ! -path "${VI}" ! -path "${AB}")
TS=$(find "${DIR}" -type f -path "${DIR}/*/outbound/*.done" -mmin "+${OT}")
TS=$(find "${DIR}" -type f -path "${DIR}/inbound/*" -mmin +"${IT}")
perl -e 'printf("%s,%d\n", $ARGV[0], (stat("$ARGV[0]"))[9]);' "$TS"
done
Output:
,0
/home/MF/NA/CD/VI/inbound/vis,1492716168
/home/MF/NA/CD/AB/inbound/abc,1492716485
Desired Output
/home/MF/NA/CD/VI/inbound/vis,1492716168
/home/MF/NA/CD/AB/inbound/abc,1492716485

The script has many problems:
it assigns 3x in row the TS variable, so, only the last one will be used. The first two runs of the find is pointless - so probably you want achieve something else.
youre getting the mtime using perl. It is cool idea if you will read the filenames from the stdin and not starting perl X times. In such case will be faster to use the stat shell command - with other words, you want read the filenames from the stdin.
always use read -r (unless you know why do not want the -r) :)
useless use of cat. Just redirect the whole loop input from a file
So, the script could probably looks like:
#!/bin/ksh
VI=*/CD/VI/*
AB=*/CD/AB/*
while read -r DIR IT OT; do
find "${DIR}" -type f -path "${DIR}/*/inbound/*" -mmin "+${IT}" ! -path "${VI}" ! -path "${AB}" -print
find "${DIR}" -type f -path "${DIR}/*/outbound/*.done" -mmin "+${OT}" -print
find "${DIR}" -type f -path "${DIR}/inbound/*" -mmin +"${IT}" -print
done < InputConfig.txt | perl -lne 'printf "%s,%d\n", $_, (stat($_))[9];'
This is more ksh and/or shell question as perl. :)

If I have well understood, you want to use perl to display the name and the size of the files found by previous find commands. Something like that should work:
#!/bin/ksh
VI=*/CD/VI/*
AB=*/CD/AB/*
cat InputConfig.txt | while read DIR IT OT; do
(find "${DIR}" -type f -path "${DIR}/*/inbound/*" -mmin "+${IT}" ! -path "${VI}" ! -path "${AB}" ;
find "${DIR}" -type f -path "${DIR}/*/outbound/*.done" -mmin "+${OT}" ;
find "${DIR}" -type f -path "${DIR}/inbound/*" -mmin +"${IT}") |
xargs -l perl -e 'printf("%s,%d\n", $ARGV[0], (stat("$ARGV[0]"))[9]);'
done

Thanks everyone for your input but I went back to my original if-else method of script since my TIBCO project from which I am calling the script was not liking the output format.
My script, invoked like ./CDFindFiles /home/NA/CD/:
#!/bin/ksh
FOLDER=$1
VI=*/CD/VI/inbound
AB=*/CD/AB/inbound
find "$FOLDER" -type f -path "${FOLDER}*/inbound/*" -o -path "${FOLDER}*/outbound/*.done" | while read line;
do
MODTIME=$(perl -e 'printf "%d\n",(-M shift)*24*60' "$line")
if [[ "$line" == *"$VI"* && "$MODTIME" -gt 90 || "$line" == *"$AB"* && "$MODTIME" -gt 180 ]]; then
perl -e 'printf("%s,%d\n", $ARGV[0], (stat("$ARGV[0]"))[9]);' "$line"
elif [[ "$line" != *"$VI"* && "$line" != *"$AB"* && "$MODTIME" -gt 30 ]]; then
perl -e 'printf("%s,%d\n", $ARGV[0], (stat("$ARGV[0]"))[9]);' "$line"
fi
done

Related

Split all of the files in a directory by 1000 lines and process them through a perl script

I used this line to try and split all of the files in a directory into smaller files and put them in a new directory.
cd /path/to/files/ && find . -maxdepth 1 -type f -exec split -l 1000 '{}' "path/to/files/1/prefix {}" \;
The result was 'no file found', so how do I make this work so that I split all of the files in a directory into smaller 1000-line files and place them in a new directory?
Later on...
I tried many variations and this is not working. I read another article that split cannot operate on multiple files. Do I need to make a shell script, or how do I do this?
I had a bright idea to use a loop. So, I researched the 'for' loop and the following worked:
for f in *.txt.*; do echo "Professing $f file..."; split -l 1000 $f 1split.ALLEMAILS.txt. ; done
.txt. is in all of the files in the working directory. the 'echo' command was optional. for the 'split' command, instead of naming one file, I replaced that with $f as defined by the 'for' line.
The only thing I would like to have been able to do is move all of these to another directory in the command.
Right now, I am stuck on the find command for moving all matching files. This is what I have done so far that is not working:
find . -type f -name '1split.*' -exec mv {} new/directory/{} \;
I get the error ' not a directory ' ; or I tried:
find . -type f -name '1split.*' -exec mv * . 1/ \;
and I get ' no such file or directory '
Any ideas?
I found that this command moved ALL of the files to the new directory instead of the ones specifically meeting the criteria '1split.*'
So, the answers to my questions are:
for f in *.txt.*; do echo "Professing $f file..."; split -l 1000 $f 1split.ALLEMAILS.txt. ; done
and
mv *searchcriteria /new/directory/path/
I did not need a find command for this after all. So, combining both of these would have done the trick:
for f in *.txt.*; do echo "Professing $f file..."; split -l 1000 $f 1split.ALLEMAILS.txt. ; done
mv *searchcriteria /new/directory/path/ | echo "done."
---later on...
I found that this basically took 1 file and processed it.
I fixed that with a small shell script:
#!/bin/sh
for f in /file/path/*searchcriteria ; ## this was 'split.*' in my case
do echo "Processing $f in /file/path/..." ;
perl script.pl --script=options $f > output.file ;
done ;
echo "done."

Unix Shell scripting find and replace string in specific files in subfolders

I want to replace the string "Solve the problem" with "Choose the best answer" in only the xml files which exist in the subfolders of a folder. I have compiled a script which helps me to do this, but there are 2 problems
It also replaces the content of the script
It replaces the text in all files of the subfolders( but I want only xml to change)
I want to display error messages(text output preferably) if the text mismatch happens in a particular subfolder and file.
So can you please help me modify my existing script so that I can solve the above 3 problems.
The script I have is :
find -type f | xargs sed -i "s/Solve the problem/Choose the best answer/g"
Using bash and sed:
search='Solve the problem'
replace='Choose the best answer'
for file in `find -name '*.xml'`; do
grep "$search" $file &> /dev/null
if [ $? -ne 0 ]; then
echo "Search string not found in $file!"
else
sed -i "s/$search/$replace/" $file
fi
done
find -type f -name "*.xml" | xargs sed -i "s/Solve the problem/Choose the best answer/g"
Not sure I understand issue 3.

find command for the newest 500 files in a directory tree and also be POSIX compliant

I'm looking for a single line shell script or unix command to find the newest 500 files in a directory tree. Major constraints are it should be POSIX compliant and the directory can have tons of files.
I found from the below link a perl script which helped:
find . -type f -print | perl -l -ne ' $_{$_} = -M; END { $,="\n"; print sort {$_{$b} <=> $_{$a}} keys %_ }' | head -n 500
How to recursively find and list the latest modified files in a directory with subdirectories and times?
Any more comments most welcome.Thanks all.
How about this:
Posix ls and head
ls -tc DIR | head -n 500
find . -type f -print | perl -l -ne ' ${$} = -M; END { $,="\n"; print sort {${$b} <=> ${$a}} keys %_ }' | head -n 500
It should be the contrary for the sort ${$a} <=> ${$b}
The head can be avoided: print+(...)[0..499]
The find too with a recursive call:
perl -e 'sub R{($_)=#_;map{-d$_?&R($_):$_}<$_/*>}print$_,$/for(sort{-M$a<=>-M$b}R".")[0..499]'
Or with an unix cmd: not sure if there are to many arguments may fail
find . -type f -exec ls -1t {} + | head -500
find . -type f -print0 | xargs -0 ls -1t | head -500
find . -type f -exec stat -c %Y:%n {} \; |
sort -rn | sed -e 's/.*://' -e 500q
This sorts on ctime, which can be changed by using %Z or %X in the format string, but stat is not POSIX.
There is no 100% reliable POSIX way of doing this with shell scripting.
A POSIX C program will do it easily though, assuming you define newest by either last modified file content or last changed file. If you mean last creation time, there is no POSIX way and possibly no solution at all, depending on the file system used.

Why does "find . -name *.txt | xargs du -hc" give multiple totals?

I have a large set of directories for which I'm trying to calculate the sum total size of several hundred .txt files. I tried this, which mostly works:
find . -name *.txt | xargs du -hc
But instead of giving me one total at the end, I get several. My guess is that the pipe will only pass on so many lines of find's output at a time, and du just operates on each batch as it comes. Is there a way around this?
Thanks!
Alex
How about using the --files0-from option to du? You'd have to generate the null-terminated file output appropriately:
find . -name "*txt" -exec echo -n -e {}"\0" \; | du -hc --files0-from=-
works correctly on my system.
find . -print0 -iname '*.txt' | du --files0-from=-
and if you want to have several different extensions to search for it's best to do:
find . -type f -print0 | grep -azZEi '\.(te?xt|rtf|docx?|wps)$' | du --files0-from=-
The xargs program breaks things up into batches, to account for the limits due to the maximum length of a unix command line. It's still more efficient than running your subcommand one at a time but, for a long list of inputs, it will run the command enough times that each "run" is short enough that it won't cause issues.
Because of this, you're likely seeing one output line per "batch" that xargs needs to run.
Because you may find it useful/interesting, the man page can be found online here: http://unixhelp.ed.ac.uk/CGI/man-cgi?xargs
One other thing to note (and this may be a typo in your post or my misunderstanding) is that you have the "*.txt" unescaped/quoted. Ie, you have
find . -name *.txt | xargs du -hc
where you probably want
find . -name \*.txt | xargs du -hc
The difference being that the command line may be expanding the * into the list of filenames that match... rather than passing the * into find, which will use it as a pattern.
Another simple solution:
find . -name *.txt -print0 | xargs -0 du -hc
One alternate solution is to use bash for loop:
for i in `find . -name '*.txt'`; do du -hc $i | grep -v 'total'; done
This is good for when you need more control of what happens in the loop.
xargs busts its input into reasonable-sized chunks - what you're seeing are totals for each of those chunks. Check the man page for xargs on ways to configure its handling of input.
One alternate solution is to use awk:
find . -name "*.txt" -exec ls -lt {} \; | awk -F " " 'BEGIN { sum=0 } { sum+=$5 } END { print sum }'

Using sed to grab filename from full path?

I'm new to sed, and need to grab just the filename from the output of find. I need to have find output the whole path for another part of my script, but I want to just print the filename without the path. I also need to match starting from the beginning of the line, not from the end. In english, I want to match, the first group of characters ending with ".txt" not containing a "/". Here's my attempt that doesn't work:
ryan#fizz:~$ find /home/ryan/Desktop/test/ -type f -name \*.txt
/home/ryan/Desktop/test/two.txt
/home/ryan/Desktop/test/one.txt
ryan#fizz:~$ find /home/ryan/Desktop/test/ -type f -name \*.txt | sed s:^.*/[^*.txt]::g
esktop/test/two.txt
ne.txt
Here's the output I want:
two.txt
one.txt
Ok, so the solutions offered answered my original question, but I guess I asked it wrong. I don't want to kill the rest of the line past the file suffix i'm searching for.
So, to be more clear, if the following:
bash$ new_mp3s=\`find mp3Dir -type f -name \*.mp3\` && cp -rfv $new_mp3s dest
`/mp3Dir/one.mp3' -> `/dest/one.mp3'
`/mp3Dir/two.mp3' -> `/dest/two.mp3'
What I want is:
bash$ new_mp3s=\`find mp3Dir -type f -name \*.mp3\` && cp -rfv $new_mp3s dest | sed ???
`one.mp3' -> `/dest'
`two.mp3' -> `/dest'
Sorry for the confusion. My original question just covered the first part of what I'm trying to do.
2nd edit:
here's what I've come up with:
DEST=/tmp && cp -rfv `find /mp3Dir -type f -name \*.mp3` $DEST | sed -e 's:[^\`].*/::' -e "s:$: -> $DEST:"
This isn't quite what I want though. Instead of setting the destination directory as a shell variable, I would like to change the first sed operation so it only changes the cp output before the "->" on each line, so that I still have the 2nd part of the cp output to operate on with another '-e'.
3rd edit:
I haven't figured this out using only sed regex's yet, but the following does the job using Perl:
cp -rfv `find /mp3Dir -type f -name \*.mp3` /tmp | perl -pe "s:.*/(.*.mp3).*\`(.*/).*.mp3\'$:\$1 -> \$2:"
I'd like to do it in sed though.
Something like this should do the trick:
find yourdir -type f -name \*.txt | sed 's/.*\///'
or, slightly clearer,
find yourdir -type f -name \*.txt | sed 's:.*/::'
Why don't you use basename instead?
find /mydir | xargs -I{} basename {}
No need external tools if using GNU find
find /path -name "*.txt" -printf "%f\n"
I landed on the question based on the title: using sed to grab filename from fullpath.
So, using sed, the following is what worked for me...
FILENAME=$(echo $FULLPATH | sed -n 's/^\(.*\/\)*\(.*\)/\2/p')
The first group captures any directories from the path. This is discarded.
The second group capture is the text following the last slash (/). This is returned.
Examples:
echo "/test/file.txt" | sed -n 's/^\(.*\/\)*\(.*\)/\2/p'
file.txt
echo "/test/asd/asd/entrypoint.sh" | sed -n 's/^\(.*\/\)*\(.*\)/\2/p'
entrypoint.sh
echo "/test/asd/asd/default.json" | sed -n 's/^\(.*\/\)*\(.*\)/\2/p'
default.json
find /mydir | awk -F'/' '{print $NF}'
path="parentdir2/parentdir1/parentdir0/dir/FileName"
name=${path##/*}