Unable to filter rows which contain "Is a directory" by SED/AWK - sed

I run the code gives me the following sample data
md5deep find * | awk '{ print $1 }'
A sample of the output
/Users/math/Documents/Articles/Number theory: Is a directory
258fe6853b1bfb2d07f512ff6bec52b1
/Users/math/Documents/Articles/Probability and statistics: Is a directory
4811bfb2ad04b9f4318049c01ebb52ef
8aae4ac3694658cf90005dbdea37b4d5
258fe6853b1bfb2d07f512ff6bec52b1
I have tried to filter the rows which contain Is a directory by SED unsuccessfully
md5deep find * | awk '{ print $1 }' | sed s/\/*//g
Its sample output is
/Users/math/Documents/Articles/Number theory: Is a directory
/Users/math/Documents/Articles/Topology: Is a directory
/Users/math/Documents/Articles/useful: Is a directory
How can I filter Out each row which contains "Is a directory" by SED/AWK?
[clarification]
I want to filter out the rows which contain Is a directory.

I have not used the md5deep tool, but I believe those lines are error messages; they would be going to standard error instead of standard out, and so they are going directly to your terminal instead of through the pipe. Thus, they won't be filtered by your sed command. You could filter them by merging your standard error and standard output streams, but
It looks like (I'm not sure because you are missing the backquotes) you are trying to call
md5deep `find *`
and find is returning all of the files and directories.
Some notes on what you might want to do:
It looks like md5deep has a -r for "recursive" option. So, you may want to try:
md5deep -r *
instead of the find command.
If you do wish to use a find command, you can limit it to only files using -type f, instead of files and directories. Also, you don't need to pass * into a find command (which may confuse find if there are files that have names that looks like the options that find understands); passing in . will search recursively through the current directory.
find . -type f
In sed if you wish to use slashes in your pattern, it can be a pain to quote them correctly with \. You can instead choose a different character to delimit your regular expression; sed will use the first character after the s command as a delimiter. Your pattern is also lacking a .; in regular expressions, to indicate one instance of any character you use ., and to indicate "zero or more of the preceding expression" you use *, so .* indicates "zero or more of any character" (this is different from glob patterns, in which * alone means "zero or more of any character").
sed "s|/.*||g"
If you really do want to be including your standard error stream in your standard output, so it will pass through the pipe, then you can run:
md5deep `find *` 2>&1 | awk ...
If you just want to ignore stderr, you can redirect that to /dev/null, which is a special file that just discards anything that goes into it:
md5deep `find *` 2>/dev/null | awk ...
In summary, I think the command below will help you with your immediate problem, and the other suggestions listed above may help you if I did not undersand what you were looking for:
md5deep -r * | awk '{ print $1 }'

To specifically answer the clarification: how to filter out lines using awk and sed:
awk '/Is a directory/ {next} {print}'
sed 'g/Is a directory/d'

Why not use grep instead?
ie,
md5deep find * | grep "Is a directory" | awk '{ print $1 }'
Edit: I just re-read your question and if you want to remove the lines with Is a directory, use the -v flag of grep, ie:
md5deep find * | grep -v "Is a directory" | awk '{ print $1 }'

I'm not intimately familiar with md5deep, but this may do something like you are tying to do.
find -type f -exec md5sum {} +

Related

How to rename a zero-padded file sequence efficiently in ZSH?

I have a picture sequence named with zero-padded numbers like so:
/path/to/file_07469.jpx
/path/to/file_07470.jpx
/path/to/file_07471.jpx
/path/to/file_07472.jpx
/path/to/file_07473.jpx
/path/to/file_07474.jpx
/path/to/file_07475.jpx
/path/to/file_07476.jpx
/path/to/file_07477.jpx
/path/to/file_07478.jpx
/path/to/file_07479.jpx
/path/to/file_07480.jpx
/path/to/file_07481.jpx
/path/to/file_07482.jpx
This is just an extract. It is thousands of files. I’d like to rename all files from a certain number on, adding / subtracting X. I’d love to use find with a regex.
#!/bin/zsh
shift=-1000
seqnumstart="$(echo "$1" | grep -Eo "\d+")"
bn="$(basename $1)"
bbn="$(echo "${bn%_*}")"
ext="$(echo "${bn##*.}")"
find "$(dirname $1)" -name "$bbn*$ext" -print0 | while read -d $'\0' file
do
seqnum="$(echo "$file" | grep -Eo "\d+")"
seqnum="$(echo "${seqnum#"${seqnum%%[!0]*}"}")"
if [[ "$seqnum" -ge "$seqnumstart" ]]; then
seqnumnew=$(($seqnum + $shift))
seqnumnew=$(printf %05d $seqnumnew)
filenew="$(echo $file | sed -E 's [0-9]+ '$seqnumnew' g')"
mv "$file" "$filenew"
fi
done
How can I improve my code? It is very slow. Im on a Mac (zsh).
zmv is a utility in zsh that can do a lot of filename manipulation and looping for you. Try this:
zmv -n 'p/file_(<7000-7999>).jpx' 'p/file_$(printf "%05d" $(($1 - 1000))).jpx'
Some of the pieces:
zmv: an autoload function; use autoload -Uz zmv to make it available (this is usually added to .zshrc).
-n: no-op. With this option, zmv will just print what would have happened, giving you an idea if the command is correct. Remove this to actually mv the files.
(...): grouping operator for zmv. This identifies sections in the name that you want to change; this section is referenced in the 'to' argument as $1.
<7000-7999>: glob operator for a range. Note that leading zeroes are not always required.
$(printf "%05d" ...): zero-padding.
$((...)): arithmetic.
$1: reference to the parenthetical value in the 'from' argument'. This is where zmv's magic happens - this is substituted for each matching filename.
As you likely know, you'll need to do the renaming in groups or in a specific order to avoid trying to change a name to a name that already exists. zmv will usually halt when it encounters collisions like that.
This is much faster:
#!/bin/zsh
shift=1000
seqnumstart="$(echo "$1" | grep -Eo "\d+")"
lastfile="$(find "$(dirname $1)" -name "*.jpx" | sort | tail -1)"
seqnumend="$(echo "$lastfile" | grep -Eo "\d+")"
bn="$(basename $1)"
bbn="$(echo "${bn%_*}")"
#extension
ext="$(echo "${bn##*.}")"
#basepath before the padded number
bp="$(echo "${1%_*}")"
function buildpath {
echo "$bp"_"$1"."$ext"
}
for i in {$seqnumstart..$seqnumend}
do
unpad="$(echo $i | sed 's/^0*//')"
seqnumnew="$(($unpad + $shift))"
seqnumnewpad="$(printf %05d $seqnumnew)"
op="$(buildpath "$i")"
np="$(buildpath "$seqnumnewpad")"
mv "$op" "$np"
done

Strange result using grep

I've made a small script trying to search through a file looking for all occurrences of specific strings like this: a0002 b0590 c0964
The script goes like this:
#!/bin/sh
#include <stdio.h>
#
while read id;
do
awk {'print $1'} test.trans | grep -e "$id"
done < test.id
To simplify things, I made a stripped down version of the file I'm searching through (test.trans):
"a0001"
"a0002"
"b0586"
"b0587"
"b0588"
"b0589"
"b0590"
"b0591"
"b0852"
"b0952"
"a0002"
"b0587"
"c0952"
"c0964"
"c1783"
"c1786"
"c1787"
I have stored all the relevant search strings in a separate file named test.id which looks like this:
a0002
b0587
b0588
b0589
b0590
b0591
b0852
b0952
c0952
c0964
c1781
The idea is to pass each search string in the test.id file as a variable which is then used by grep to filter out all occurrences in the test.trans file
However, when I run the script, grep only matches some of the strings. When I change the order of the search patterns in the test.id file, the result also changes. What am I doing wrong?
I consider myself a newbie in shell programming, and would appreciate any help.
I don't know the reason for your problem.
But here are some remarks that don't fit in a comment.
If you want to put out a file to stdout (the standard output) you can use cat test.trans instead of awk {'print $1'} test.trans.
But is you want grep to process a file you must not read it with some other tool and pipr it to grep. grep can read this file directly by using `grep -e "$id" test.trans
If you alread use awk you don't need grep. you can achiefe this wih awk by calling awk /"$id"'/ {print $1}' test.trans grep can filter more than one pattern. Instead of your for loop do
grep -f test.id test.trans
After some experimenting, I found out that it all boiled down to removing the carriage return (\r) from each line in the test.id file. This file was received from a DOS-machine, and my iMac is using UNIX-format which only contains newline (\n)

What is the purpose of the "-" in sh script line: ext="$(echo $ext | sed 's/\./\\./' -)"

I am porting a sh script that was apparently written using GNU implementation of sed to BSD implementation of sed. The exact line in the script with the original comment are:
# escape dot in file extension to grep it
ext="$(echo $ext | sed 's/\./\\./' -)"
I am able to reproduce a results with the following (obviously I am not exhausting all possibilities values for ext) :
ext=.h; ext="$(echo $ext | sed 's/\./\\./' -)"; echo [$ext]
Using GNU's implementation of sed the following is returned:
[\.h]
Using BSD's implementation of sed the following is returned:
sed: -: No such file or directory
[]
Executing ext=.h; ext="$(echo $ext | sed 's/\./\\./')"; echo [$ext] returns [\.h] for both implementation of sed.
I have looked at both GNU and BSD's sed's man page have not found anything about the trailing "-". Googling for sed with a "-" is not very fruitful either.
Is the "-" a typo?
Is the "-" needed for some an unexpected value of $ext?
Is the issue not with sed, but rather with sh?
Can someone direct me to what I should be looking at, or even better, explain what the purpose of the "-" is?
On my system, that syntax isn't documented in the man page, but it is in the
'info' page:
sed OPTIONS... [SCRIPT] [INPUTFILE...]
If you do not specify INPUTFILE, or if INPUTFILE is -',sed'
filters the contents of the standard input.
Given that particular usage, I think you could leave off the '-' and it should
still work.
You got your specific question answered BUT your script is all wrong. Take a look at this:
# escape dot in file extension to grep it
ext="$(echo $ext | sed 's/\./\\./')"
The main problems with that are:
You're not quoting your variable ($ext) so it will go through file name expansion plus if it contains spaces will be passed to echo as multiple arguments instead of 1. Do this instead:
ext="$(echo "$ext" | sed 's/\./\\./')"
You're using an external command (sed) and a pipe to do something the shell can do trivially itself. Do this instead:
ext="${ext/./\.}"
Worst of all: You're escaping the RE meta-character (.) in your variable so you can pass it to grep to do an RE search on it as if it were a string - that doesn't make any sense and becomes intractable in the general case where your variable could contain any combination of RE metacharacters. Just do a string search instead of an RE search and you don't need to escape anything. Don't do either of the above substitution commands and then do either of these instead of grep "$ext" file:
grep -F "$ext" file
fgrep "$ext" file
awk -v ext="$ext" 'index($0,ext)' file

Dynamically building a exlude list for both rsync & egrep format

I wonder if anyone out there can assist me in trying to solve a issue with me.
I have written a set of shell scripts with the purpose of auditing remote file systems based on a GOLD build on a audit server.
As part of this, I do the following:
1) Use rsync to work out any new files or directories, any modified or removed files
2) Use find ${source_filesystem} -ls on both local & remote to work out permissions differences
Now as part of this there are certain files or directories that I am excluding, i.e. logs, trace files etc.
So in order to achieve this I use 2 methods:
1) RSYNC - I have an exclude-list that is added using --exclude-from flag
2) find -ls - I use a egrep -v statement to exclude the same as the rsync exclude-list:
e.g. find -L ${source_filesystem} -ls | egrep -v "$SEXCLUDE_supt"
So my issue is that I have to maintain 2 separate lists and this is a bit of a admin nightmare.
I am looking for some assistance or some advice on if it is possible to dynamically build a list of exlusions that can be used for both the rsync or the find -ls?
Here is the format of what the exclude lists look like::
RSYNC:
*.log
*.out
*.csv
logs
shared
tracing
jdk*
8.6_Code
rpsupport
dbarchive
inarchive
comms
PR116PICL
**/lost+found*/
dlxwhsr*
regression
tmp
working
investigation
Investigation
dcsserver_weblogic_*.ear
dcswebrdtEAR_weblogic_*.ear
FIND:
SEXCLUDE_supt="\.log|\.out|\.csv|logs|shared|PR116PICL|tracing|lost\+found|jdk|8\.6\_Code|rpsupport|dbarchive|inarchive|comms|dlxwhsr|regression|tmp|working|investigation|Investigation|dcsserver_weblogic_|dcswebrdtEAR_weblogic_"
You don't need to create a second list for your find command. grep can handle a list of patterns using the -f flag. From the manual:
-f FILE, --file=FILE
Obtain patterns from FILE, one per line. The empty file contains zero
patterns, and therefore matches nothing. (-f is specified by POSIX.)
Here's what I'd do:
find -L ${source_filesystem} -ls | grep -Evf your_rsync_exclude_file_here
This should also work for filenames containing newlines and spaces. Please let me know how it goes.
In the end the grep -Evf was a bit of a nightmare as rsync didnt support regex, it uses regex but not the same.
So I then pursued my other idea of dynamically building the exclude list for egrep by parsing the rsync exclude-list and building variable on the fly to pass into egrep.
This the method I used:
#!/bin/ksh
# Create Signature of current build
AFS=$1
#Create Signature File
crSig()
{
find -L ${SRC} -ls | egrep -v **"$SEXCLUDE"** | awk '{fws = ""; for (i = 11; i <= NF; i++) fws = fws $i " "; print $3, $6, fws}' | sort >${BASE}/${SIFI}.${AFS}
}
#Setup SRC, TRG & SCROOT
LoadAuditReqs()
{
export SRC=`grep ${AFS} ${CONF}/fileSystem.properties | awk {'print $2'}`
export TRG=`grep ${AFS} ${CONF}/fileSystem.properties | awk {'print $3'}`
export SCROOT=`grep ${AFS} ${CONF}/fileSystem.properties | awk {'print $4'}`
**export BEXCLUDE=$(sed -e 's/[*/]//g' -e 's/\([._+-]\)/\\\1/g' ${CONF}/exclude-list.${AFS} | tr "\n" "|")**
**export SEXCLUDE=$(echo ${BEXCLUDE} | sed 's/\(.*\)|/\1/')**
}
#Load Properties File
LoadProperties()
{
. /users/rpapp/rpmonit/audit_tool/conf/environment.properties
}
#Functions
LoadProperties
LoadAuditReqs
crSig
So with these new variables:
**export BEXCLUDE=$(sed -e 's/[*/]//g' -e 's/\([._+-]\)/\\\1/g' ${CONF}/exclude-list.${AFS} | tr "\n" "|")**
**export SEXCLUDE=$(echo ${BEXCLUDE} | sed 's/\(.*\)|/\1/')**
I use them to remove "*" and "/", then match my special characters and prepend with "\" to escape them.
Then it using "tr" replace a newline with "|" and then rerunning that output to remove the trailing "|" to make the variable $SEXCLUDE to use for egrep that is used in the crSig function.
What do you think?

unix find and replace text in dir and subdirs

I'm trying to change the name of "my-silly-home-page-name.html" to "index.html" in all documents within a given master directory and subdirs.
I saw this: Shell script - search and replace text in multiple files using a list of strings.
And this: How to change all occurrences of a word in all files in a directory
I have tried this:
grep -r "my-silly-home-page-name.html" .
This finds the lines on which the text exists, but now I would like to substitute 'my-silly-home-page-name' for 'index'.
How would I do this with sed or perl?
Or do I even need sed/perl?
Something like:
grep -r "my-silly-home-page-name.html" . | sed 's/$1/'index'/g'
?
Also; I am trying this with perl, and I try the following:
perl -i -p -e 's/my-silly-home-page-name\.html/index\.html/g' *
This works, but I get an error when perl encounters directories, saying "Can't do inplace edit: SOMEDIR-NAME is not a regular file, <> line N"
Thanks,
jml
find . -type f -exec \
perl -i -pe's/my-silly-home-page-name(?=\.html)/index/g' {} +
Or if your find doesn't support -exec +,
find . -type f -print0 | xargs -0 \
perl -i -pe's/my-silly-home-page-name(?=\.html)/index/g'
Both pass to Perl as arguments as many names at a time as possible. Both work with any file name, including those that contains newlines.
If you are on Windows and you are using a Windows build of Perl (as opposed to a cygwin build), -i won't work unless you also do a backup of the original. Change -i to -i.bak. You can then go and delete the backups using
find . -type f -name '*.bak' -delete
This should do the job:
find . -type f -print0 | xargs -0 sed -e 's/my-silly-home-page-name\.html/index\.html/g' -i
Basically it gathers recursively all the files from the given directory (. in the example) with find and runs sed with the same substitution command as in the perl command in the question through xargs.
Regarding the question about sed vs. perl, I'd say that you should use the one you're more comfortable with since I don't expect huge differences (the substitution command is the same one after all).
There are probably better ways to do this but you can use:
find . -name oldname.html |perl -e 'map { s/[\r\n]//g; $old = $_; s/oldname.txt$/newname.html/; rename $old,$_ } <>';
Fyi, grep searches for a pattern; find searches for files.