Bash or Python efficient substring matching and filtering

Bash or Python efficient substring matching and filtering - substring

I have a set of filenames in a directory, some of which are likely to have identical substrings but not known in advance. This is a sorting exercise. I want to move the files with the maximum substring ordered letter match together in a subdirectory named with that number of letters and progress to the minimum match until no matches of 2 or more letters remain. Ignore extensions. Case insensitive. Ignore special characters.
Example.
AfricanElephant.jpg
elephant.jpg
grant.png
ant.png
el_gordo.tif
snowbell.png
Starting from maximum length matches to minimum length matches will result in:
./8/AfricanElephant.jpg and ./8/elephant.jpg
./3/grant.png and ./3/ant.png
./2/snowbell.png and ./2/el_gordo.tif
Completely lost on an efficient bash or python way to do what seems a complex sort.
I found some awk code which is almost there:
{
count=0
while ( match($0,/elephant/) ) {
count++
$0=substr($0,RSTART+1)
}
print count
}
where temp.txt contains a list of the files and is invoked as eg
awk -f test_match.awk temp.txt
Drawback is that a) this is hardwired to look for "elephant" as a string (I don't know how to make it take an input string (rather than file) and an input test string to count against, and
b) I really just want to call a bash function to do the sort as specified
If I had this I could wrap some bash script around this core awk to make it work.

function longest_common_substrings () {
shopt -s nocasematch
for file1 in * ; do for file in * ; do \
if [[ -f "$file1" ]]; then
if [[ -f "$file" ]]; then
base1=$(basename "$file" | cut -d. -f1)
base2=$(basename "$file1" | cut -d. -f1)
if [[ "$file" == "$file1" ]]; then
echo -n ""
else
echo -n "$file $file1 " ; $HOME/Scripts/longest_common_substring.sh "$base1" "$base2" | tr -d '\n' | wc -c | awk '{$1=$1;print}' ;
fi
fi
fi
done ;
done | sort -r -k3 | awk '{ print $1, $3 }' > /tmp/filesort_substring.txt
while IFS= read -r line; do \
file_to_move=$(echo "$line" | awk '{ print $1 }') ;
directory_to_move_to=$(echo "$line" | awk '{ print $2 }') ;
if [[ -f "$file_to_move" ]]; then
mkdir -p "$directory_to_move_to"
\gmv -b "$file_to_move" "$directory_to_move_to"
fi
done < /tmp/filesort_substring.txt
shopt -u nocasematch
where $HOME/Scripts/longest_common_substring.sh is
#!/bin/bash
shopt -s nocasematch
if ((${#1}>${#2})); then
long=$1 short=$2
else
long=$2 short=$1
fi
lshort=${#short}
score=0
for ((i=0;i<lshort-score;++i)); do
for ((l=score+1;l<=lshort-i;++l)); do
sub=${short:i:l}
[[ $long != *$sub* ]] && break
subfound=$sub score=$l
done
done
if ((score)); then
echo "$subfound"
fi
shopt -u nocasematch
Kudos to the original solution for computing the match in the script which I found elsewhere in this site

Related

iterate over stdin fish (context: filter music files by genre grep)

I have this:
for file in **/*.ogg;
if ffprobe "$file" 2>&1 | sed -E -n 's/^ *GENRE *: (.*)/\1/p' | grep -q "$argv";
echo "$file"
else
end
end
but I would like to turn it into a function which will take a list of filenames as standard-input:
$ find . -maxdepth 1 -not -type d -exec du -h {} + | cut -f2 | filterByGenre Classical

You could do
function filterByGenre
while read line
do stuff with $line
end
end
or
function filterByGenre
set listOfLines (cat)
for line in $listOfLines
do stuff with $line
end
end

problems while reading log file with tail -n0 -F

i am monitoring the asterisk log file for peers that get offline.
the if part is working correct, but the sed command is not executed in the else part, although the echo command works. What do i need to change
tail -n0 -F /var/log/asterisk/messages | \
while read LINE
do
if echo "$LINE" | /bin/grep -q "is now UNREACHABLE!"
then
EXTEN=$(echo $LINE | /bin/grep -o -P "(?<=\').*(?=\')")
echo "$EXTEN is now UNREACHABLE!"
CALLERID=$(/bin/sed -n '/^\['"$EXTEN"'\]/,/^\[.*\]/{/^callerid*/p}' "$SIP" | /usr/bin/awk -F'=' '{ print $2 }')
if .......
then
.......
fi
elif echo "$LINE" | /bin/grep -q "is now REACHABLE!"
then
EXTEN=$(echo $LINE | /bin/grep -o -P "(?<=\').*(?=\')")
echo "$EXTEN is now REACHABLE!"
if /bin/grep -qi "^$EXTEN;" $OFFLINE; then
/bin/sed -i '/^$EXTEN;/d' $OFFLINE
fi
fi
done

You have a quoting problem - you've used single quotes when the string includes a shell variable:
if /bin/grep -qi "^$EXTEN;" $OFFLINE; then
/bin/sed -i '/^$EXTEN;/d' $OFFLINE
fi
Try using double quotes instead:
if /bin/grep -qi "^$EXTEN;" $OFFLINE; then
/bin/sed -i "/^$EXTEN;/d" $OFFLINE
fi

How do I fix 'command not found' that popped out when I tried 'egrep' from a variable?

I wanted to make a program that searches all the lines that contains all the factors given, from a file mydata. I tried to egrep first factor from mydata and save it in a variable a. Then, I tried to egrep the next factor from a and save the result to a again until I egrep all the factors. But when I executed the program, it said
"command not found" in line 14.
if [ $# -eq 0 ]
then
echo -e "Usage: phoneA searchfor [...searchfor]\n(You didn't tell me what you want to search for.)"
else
a=""
for i in $*
do
if [ -z "$a" ]
then
a=$(egrep "$i" mydata)
else
a=$("$a" | egrep "$i")
fi
done
awk -f display.awk "$a"
fi
I expected all the lines including all the factors outputted on the screen in the pattern that I made in display.awk.

$a contains data, not a command. You need to write that data to the pipe.
if [ $# -eq 0 ]
then
printf '%s\n' "Usage: phoneA searchfor [...searchfor]" "(You didn't tell me what you want to search for.)" >&2
exit 1
fi
a=""
for i in "$#"; do
if [ -z "$a" ]; then
a=$(egrep "$i" mydata)
else
a=$(printf '%s' "$a" | egrep "$i")
fi
done
awk -f display.awk "$a"

Resolve name by inode in current direcory

How can I resolve the name by the given inode in the current directory in the following script that prints all filenames of symlinks pointing to a specified file that is passed as an argument to the script. The list should be sorted by ctime.
#!/usr/bin/ksh
IFS="`printf '\n\t'`"
USAGE="usage: symlink.sh <file>"
get_ctime() {
perl -se 'use File::stat; $file=lstat($filename); print $file->ctime' -- -filename="$1"
}
stat_inode() {
perl -se 'use File::stat; $file=stat($filename); if (defined $file) { print $file->ino; }' -- -filename="$1"
}
lstat_inode() {
perl -se 'use File::stat; $file=lstat($filename); if (defined $file) { print $file->ino; }' -- -filename="$1"
}
if [ $# -eq 0 ]; then
echo "$USAGE"
exit 1
fi
FILE_NAME="$1"
FILE_INODE=$(stat_inode "$FILE_NAME")
if [ ! -e "$FILE_NAME" ]; then
echo "no such file \"$FILE_NAME\""
exit 1
fi
for LINK in ./* ./.[!.]* ;do
if [ -L "$LINK" ]; then
TARGET_INODE=$(stat_inode "$LINK")
if [ ! -z "$TARGET_INODE" ]; then
if [ "$FILE_INODE" -eq "$TARGET_INODE" ]; then
echo $(get_ctime "$LINK") $(lstat_inode "$LINK");
fi
fi
fi
done | sort -nk1 | awk '{print $2}'
Basically, I'd like to pipe awk to some kind of lookup function like this: | awk ' ' | lookup
I'd really appreciate if someone suggested a more elegant way to accomplish the task.
OS: SunOS 5.10
Shell: KSH

Something like this?
$ find . -maxdepth 1 -inum 2883399
./.jshintrc
$
or:
$ echo 2883399 | xargs -IX find . -maxdepth 1 -inum X
./.jshintrc
$

Using grep with sed and writing a new file based on the results

I'm very new to some of the command line utilities and have been looking for a while for a command that would accomplish my goal.
The goal is to find files that contain a string of text, replace it with a new string, and then write the results to a file that is named the same as the original, but in a different directory.
Obviously this is not working, so I am asking how you who know about this stuff would go about it.
grep -rl 'stringToFind' *.* | sed 's|oldString|newString|g' < fileNameFromGrep > ./new/fileNameFromGrep
Thanks for your input!
John

for f in "`find /YOUR/SEARCH/DIR/ROOT -type f -exec fgrep -l 'stirngToFind' \{\} \;`" ; do
sed 's|oldString|newString|g' < "${f} > ./new/"${f}
done
Will do it for you.
If you have spaces in filenames:
OLDIFS=$IFS
IFS=''
find /PATH -print0 -type f | while read -r -d $'' file
do
fgrep -l 'stirngToFind' "$file" && \
sed 's|oldString|newString|g' < "${file} > ./new/"${file}
done
IFS=$OLDIFS

#!/bin/bash
for file in *; do
if grep -qF 'stringToFind' "$file"; then
sed 's/oldString/newString/g' "$file" > "./new/$file"
fi
done

for file in path/to/dir/*
do
grep -q 'pattern' "$file" > /dev/null
if [ $? == 0 ]; then
sed 's/oldString/newString/g' "$file" > /path/to/newdir/"$file"
fi
done

You try:
sed -ie "s/oldString/newString/g" \
$(grep -Rsi 'pattern' path/to/dir/ | cut -d: -f1)
sed:
i in_place
e exec other command or script
grep:
R recursive
s Suppress error messages
i ignore case sensitive

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

Bash or Python efficient substring matching and filtering - substring

Related

iterate over stdin fish (context: filter music files by genre grep)

problems while reading log file with tail -n0 -F

How do I fix 'command not found' that popped out when I tried 'egrep' from a variable?

Resolve name by inode in current direcory

Using grep with sed and writing a new file based on the results

Categories

Resources