wc -c gives one more than I expected, why is that? - command-line

echo '2003'| wc -c
I thought it would give me 4, but it turned to be 5, what is that additional byte?

Because echo will get a new line.
echo "2014" | wc -c
it will get 5
printf "2014" | wc -c
it will get 4 where printf will not add a new line.

echo contains a built-in switch, -n, to remove newline. So running:
echo -n "2021" | wc -c
Will output the expected 4.

echo adds new line which is causing the issue.
As mentioned by "KyChen", you can use printf or:
a="2014 ;
echo $a |awk '{print length}'

Related

Bash or Python efficient substring matching and filtering

I have a set of filenames in a directory, some of which are likely to have identical substrings but not known in advance. This is a sorting exercise. I want to move the files with the maximum substring ordered letter match together in a subdirectory named with that number of letters and progress to the minimum match until no matches of 2 or more letters remain. Ignore extensions. Case insensitive. Ignore special characters.
Example.
AfricanElephant.jpg
elephant.jpg
grant.png
ant.png
el_gordo.tif
snowbell.png
Starting from maximum length matches to minimum length matches will result in:
./8/AfricanElephant.jpg and ./8/elephant.jpg
./3/grant.png and ./3/ant.png
./2/snowbell.png and ./2/el_gordo.tif
Completely lost on an efficient bash or python way to do what seems a complex sort.
I found some awk code which is almost there:
{
count=0
while ( match($0,/elephant/) ) {
count++
$0=substr($0,RSTART+1)
}
print count
}
where temp.txt contains a list of the files and is invoked as eg
awk -f test_match.awk temp.txt
Drawback is that a) this is hardwired to look for "elephant" as a string (I don't know how to make it take an input string (rather than file) and an input test string to count against, and
b) I really just want to call a bash function to do the sort as specified
If I had this I could wrap some bash script around this core awk to make it work.
function longest_common_substrings () {
shopt -s nocasematch
for file1 in * ; do for file in * ; do \
if [[ -f "$file1" ]]; then
if [[ -f "$file" ]]; then
base1=$(basename "$file" | cut -d. -f1)
base2=$(basename "$file1" | cut -d. -f1)
if [[ "$file" == "$file1" ]]; then
echo -n ""
else
echo -n "$file $file1 " ; $HOME/Scripts/longest_common_substring.sh "$base1" "$base2" | tr -d '\n' | wc -c | awk '{$1=$1;print}' ;
fi
fi
fi
done ;
done | sort -r -k3 | awk '{ print $1, $3 }' > /tmp/filesort_substring.txt
while IFS= read -r line; do \
file_to_move=$(echo "$line" | awk '{ print $1 }') ;
directory_to_move_to=$(echo "$line" | awk '{ print $2 }') ;
if [[ -f "$file_to_move" ]]; then
mkdir -p "$directory_to_move_to"
\gmv -b "$file_to_move" "$directory_to_move_to"
fi
done < /tmp/filesort_substring.txt
shopt -u nocasematch
where $HOME/Scripts/longest_common_substring.sh is
#!/bin/bash
shopt -s nocasematch
if ((${#1}>${#2})); then
long=$1 short=$2
else
long=$2 short=$1
fi
lshort=${#short}
score=0
for ((i=0;i<lshort-score;++i)); do
for ((l=score+1;l<=lshort-i;++l)); do
sub=${short:i:l}
[[ $long != *$sub* ]] && break
subfound=$sub score=$l
done
done
if ((score)); then
echo "$subfound"
fi
shopt -u nocasematch
Kudos to the original solution for computing the match in the script which I found elsewhere in this site

problems while reading log file with tail -n0 -F

i am monitoring the asterisk log file for peers that get offline.
the if part is working correct, but the sed command is not executed in the else part, although the echo command works. What do i need to change
tail -n0 -F /var/log/asterisk/messages | \
while read LINE
do
if echo "$LINE" | /bin/grep -q "is now UNREACHABLE!"
then
EXTEN=$(echo $LINE | /bin/grep -o -P "(?<=\').*(?=\')")
echo "$EXTEN is now UNREACHABLE!"
CALLERID=$(/bin/sed -n '/^\['"$EXTEN"'\]/,/^\[.*\]/{/^callerid*/p}' "$SIP" | /usr/bin/awk -F'=' '{ print $2 }')
if .......
then
.......
fi
elif echo "$LINE" | /bin/grep -q "is now REACHABLE!"
then
EXTEN=$(echo $LINE | /bin/grep -o -P "(?<=\').*(?=\')")
echo "$EXTEN is now REACHABLE!"
if /bin/grep -qi "^$EXTEN;" $OFFLINE; then
/bin/sed -i '/^$EXTEN;/d' $OFFLINE
fi
fi
done
You have a quoting problem - you've used single quotes when the string includes a shell variable:
if /bin/grep -qi "^$EXTEN;" $OFFLINE; then
/bin/sed -i '/^$EXTEN;/d' $OFFLINE
fi
Try using double quotes instead:
if /bin/grep -qi "^$EXTEN;" $OFFLINE; then
/bin/sed -i "/^$EXTEN;/d" $OFFLINE
fi

assign actions to one sed address simultaneously for match and non-match

My sed command line script looks like
echo "a,b,c,d" | sed -ne 's/[^a-zA-Z0-9]//g; /^...$/ p; /^...$/! q1'
I want the script to succeed (return-code 0) if there are exactly 3 letters left, and to fail otherwise.
The slightly nagging part is that I have to duplicate the address /^...$/.
I was hoping for something like
echo "a,b,c,d" | sed -ne 's/[^a-zA-Z0-9]//g; /^...$/ p ! q1'
but that doesn't work, at least not with that syntax.
You can use // to represent previously used regex
$ echo "a,b,c,d" | sed -ne 's/[^a-zA-Z0-9]//g; /^...$/ p; //! q1'
$ echo $?
1
$ echo "b,c,d" | sed -ne 's/[^a-zA-Z0-9]//g; /^...$/ p; //! q1'
bcd
$ echo $?
0
Alternatively, you can use b command to start next cycle
$ echo "b,c,d" | sed -ne 's/[^a-zA-Z0-9]//g; /^...$/{p;b}; q1'
bcd
$ echo $?
0
$ echo "a,b,c,d" | sed -ne 's/[^a-zA-Z0-9]//g; /^...$/{p;b}; q1'
$ echo $?
1
This syntax will probably work with GNU sed only. See manual for details

Pattern extraction using SED or AWK

How do I extract 68 from v1+r0.68?
Using awk, returns everything after the last '.'
echo "v1+r0.68" | awk -F. '{print $NF}'
Using sed to get the number after the last dot:
echo 'v1+r0.68' | sed 's/.*[.]\([0-9][0-9]*\)$/\1/'
grep is good at extracting things:
kent$ echo " v1+r0.68"|grep -oE "[0-9]+$"
68
Match the digit string before the end of the line using grep:
$ echo 'v1+r0.68' | grep -Eo '[0-9]+$'
68
Or match any digits after a .
$ echo 'v1+r0.68' | grep -Po '(?<=\.)\d+'
68
Print everything after the . with awk:
echo "v1+r0.68" | awk -F. '{print $NF}'
68
Substitute everything before the . with sed:
echo "v1+r0.68" | sed 's/.*\.//'
68
type man grep
and you will see
...
-o, --only-matching
Show only the part of a matching line that matches PATTERN.
then type echo 'v1+r0.68' | grep -o '68'
if you want it any where special do:
echo 'v1+r0.68' | grep -o '68' > anyWhereSpecial.file_ending

extract number from string

I have a string ABCD20110420.txt and I want to extract the date out of it. Expected 2011-04-20
I can use replace to remove the text part, but how do I insert the "-" ?
# echo "ABCD20110420.txt" | replace 'ABCD' '' | replace '.txt' ''
20110420
echo "ABCD20110420.txt" | sed -e 's/ABCD//' -e 's/.txt//' -e 's/\(....\)\(..\)\(..\)/\1-\2-\3/'
Read: sed FAQ
Just use the shell (bash)
$> file=ABCD20110420.txt
$> echo "${file//[^0-9]/}"
20110420
$> file="${file//[^0-9]/}"
$> echo $file
20110420
$> echo ${file:0:4}-${file:4:2}-${file:6:2}
2011-04-20
The above is applicable to files like your sample. If you have files like A1BCD20110420.txt, then will not work.
For that case,
$> file=A1BCD20110420.txt
$> echo ${file%.*} #get rid of .txt
A1BCD20110420
$> file=${file%.*}
$> echo "2011${file#*2011}"
20110420
Or you can use regular expression (Bash 3.2+)
$> file=ABCD20110420.txt
$> [[ $file =~ ^.*(2011)([0-9][0-9])([0-9][0-9])\.*$ ]]
$> echo ${BASH_REMATCH[1]}
2011
$> echo ${BASH_REMATCH[2]}
04
$> echo ${BASH_REMATCH[3]}
20
echo "ABCD20110420.txt" | sed -r 's/.+([0-9]{4})([0-9]{2})([0-9]{2}).+/\1-\2-\3/'
$ file=ABCD20110420.txt
$ echo "$file" | sed -e 's/^[A-Za-z]*\([0-9][0-9][0-9][0-9]\)\([0-9][0-9]\)\([0-9][0-9]\)\.txt$/\1-\2-\3/'
This only requires a single call to sed.
echo "ABCD20110420.txt" | sed -r 's/.{4}(.{4})(.{2})(.{2}).txt/\1-\2-\3/'