posix sh: how to count number of occurrences in a string without using external tools? - sh

In bash, it can be done like this:
#!/bin/bash
query='bengal'
string_to_search='bengal,toyger,bengal,persian,bengal'
delimiter='|'
replace_queries="${string_to_search//"$query"/"$delimiter"}"
delimiter_count="${replace_queries//[^"$delimiter"]}"
delimiter_count="${#delimiter_count}"
echo "Found $delimiter_count occurences of \"$query\""
Output:
Found 3 occurences of "bengal"
The caveat of course is that the delimiter cannot occur in 'query' or 'string_to_search'.
In POSIX sh, string replacement is not supported. Is there a way this can be done in POSIX sh using only shell builtins?

#!/bin/sh
query='bengal'
string_to_search='bengal,toyger,bengal,persian,bengal'
ct() (
n=0
IFS=,
q=$1
set $2
for t in "$#"; do
if [ "$t" = "$q" ]; then
n=$((n + 1))
fi
done
echo $n
)
n=$(ct "$query" "$string_to_search")
printf "found %d %s\n" $n $query
Though I'm not sure what the point is. If you've got a posix shell,
you also almost certainly have printf, sed, grep, and wc.
printf '%s\n' "$string_to_search" | sed -e 's/,/\n/g' | grep -Fx "$query" | wc -l

Think I got it...
#!/bin/sh
query='bengal'
string_to_search='bengal,toyger,bengal,persian,bengal'
i=0
process_string="$string_to_search"
while [ -n "$process_string" ]; do
case "$process_string" in
*"$query"*)
process_string="${process_string#*"$query"}"
i="$(( i + 1 ))"
;;
*)
break
;;
esac
done
echo "Found $i occurences of \"$query\""

Related

Subset a string in POSIX shell

I have a variable set in the following format:
var1="word1 word2 word3"
Is it possible to subset/delete one of the space-delimited word portably? What I want to archive is something like this:
when --ignore option is supplied with the following argument
$ cmd --ignore word1 # case 1
$ cmd --ignore "word1 word2" # case2
I want the var1 changes to have only the following value
"word2 word3" # case1
"word3" #case2
If there is no way to achieve above described, is there a way to improve the efficiency of the following for loop? (The $var1 is in a for loop so my alternative thought to achieve similar was having following code)
# while loop to get argument from options
# argument of `--ignore` is assigned to `$sh_ignore`
for i in $var1
do
# check $i in $sh_ignore instead of other way around
# to avoid unmatch when $sh_ignore has more than 1 word
if ! echo "$sh_ignore" | grep "$i";
then
# normal actions
else
# skipped
fi
done
-------Update-------
After looking around and reading the comment by #chepner I now temporarily using following code (and am looking for improvement):
sh_ignore=''
while :; do
case
# some other option handling
--ignore)
if [ "$2" ]; then
sh_ignore=$2
shift
else
# defined `die` as print err msg + exit 1
die 'ERROR: "--ignore" requires a non-empty option argument.'
fi
;;
# handling if no arg is supplied to --ignore
# handling -- and unknown opt
esac
shift
done
if [ -n "$sh_ignore" ]; then
for d in $sh_ignore
do
var1="$(echo "$var1" | sed -e "s,$d,,")"
done
fi
# for loop with trimmed $var1 as downstream
for i in $var1
do
# normal actions
done
One method might be:
var1=$(echo "$var1" |
tr ' ' '\n' |
grep -Fxv -e "$(echo "$sh_ignore" | tr ' ' '\n')" |
tr '\n' ' ')
Note: this will leave a trailing blank, which can be trimmed off via var1=${var1% }

Merge two lines into one within a configuration file

I have several AIX systems with a configuration file, let's call it /etc/bar/config. The file may or may not have a line declaring values for foo. An example would be:
foo = A_1,GROUP_1,USER_1,USER_2,USER_3
The foo line may or may not be the same on all systems. Different systems may have different values and different a different number of values. My task is to add "bare minimum" values in the config file on all systems. The bare minimum line will look like this.
foo = A_1,USER_1,SYS_1,SYS_2
If the line does not exist, I must create it. If the line does exist, I must merge the two lines. Using my examples, the result would be this. The order of the values does not matter.
foo = A_1,GROUP_1,USER_1,USER_3,USER_2,SYS_1,SYS_2
Obviously I want a script to do my work. I have the standard sh, ksh, awk, sed, grep, perl, cut, etc. Since this is AIX, I do not have access to the GNU versions of these utilities.
Originally, I had a script with these commands to replace the entire foo line.
cp /etc/bar/config /etc/bar/config.$$
sed "s/foo = .*/foo = A_1,USER_1,SYS_1,SYS_2/" /etc/bar/config.$$ > /etc/bar/config
But this simply replaces the line. It does take into consideration any pre-existing configuration, including a line that's missing. And I'm doing other configuration modifications in the script, such as adding completely unique lines to other files and restarting a process, so I'd perfer this be some type of shell-based code snippet I can add to my change script. I am open to other options, especially if the solution is simpler.
Some dirty bash/sed:
#!/usr/bin/bash
input_file="some_filename"
v=$(grep -n '^foo *=' "$input_file")
lineno=$(cut -d: -f1 <<< "${v}0:")
base="A_1,USER_1,SYS_1,SYS_2,"
if [[ "$lineno" == 0 ]]; then
echo "foo = A_1,USER_1,SYS_1,SYS_2" >> "$input_file"
else
all=$(sed -n ${lineno}'s/^foo *= */'"$base"'/p' "$input_file" | \
tr ',' '\n' | sort | uniq | tr '\n' ',' | \
sed -e 's/^/foo = /' -e 's/, *$//' -e 's/ */ /g' <<< "$all")
sed -i "${lineno}"'s/.*/'"$all"'/' "$input_file"
fi
Untested bash, etc.
config=/etc/bar/config
default=A_1,USER_1,SYS_1,SYS_2
pattern='^foo[[:blank:]]*=[[:blank:]]*' # shared with grep and sed
if current=$( grep "$pattern" "$config" | sed "s/$pattern//" )
then
new=$( echo "$current,$default" | tr ',' '\n' | sort | uniq | paste -sd, )
sed "s/$pattern.*/foo = $new/" "$config" > "$config.$$.tmp" &&
mv "$config.$$.tmp" "$config"
else
echo "foo = $default" >> "$config"
fi
A vanilla perl solution:
perl -i -lpe '
BEGIN {%foo = map {$_ => 1} qw/A_1 USER_1 SYS_1 SYS_2/}
if (s/^foo\s*=\s*//) {
$found=1;
$foo{$_}=1 for split /,/;
$_ = "foo = " . join(",", keys %foo);
}
END {print "foo = " . join(",", keys %foo) unless $found}
' /etc/bar/config
This Perl code will do as you ask. It expects the path to the file to be modified as a parameter on the command line.
Note that it reads the entire input file into the array #config and then overwrites the same file with the modified data.
It works by building a hash %values from a combination of the items already present in the foo = line and the list of defaults items in #defaults. The combination is sorted in alphabetical order and joined eith a comma
use strict;
use warnings;
my #defaults = qw/ A_1 USER_1 SYS_1 SYS_2 /;
my ($file) = #ARGV;
my #config = <>;
open my $out_fh, '>', $file or die $!;
select $out_fh;
for ( #config ) {
if ( my ($pfx, $vals) = /^(foo \s* = \s* ) (.+) /x ) {
my %values;
++$values{$_} for $vals =~ /[^,\s]+/g;
++$values{$_} for #defaults;
print $pfx, join(',', sort keys %values), "\n";
}
else {
print;
}
}
close $out_fh;
output
foo = A_1,GROUP_1,SYS_1,SYS_2,USER_1,USER_2,USER_3
Since you didn't provide sample input and expected output I couldn't test this but this is the right approach:
awk '
/foo = / { old = ","$3; next }
{ print }
END {
split("A_1,USER_1,SYS_1,SYS_2"old,all,/,/)
for (i in all)
if (!seen[all[i]]++)
new = (new ? new "," : "") all[i]
print "foo =", new
}
' /etc/bar/config > tmp && mv tmp /etc/bar/config

How to remove YAML frontmatter from markdown files?

I have markdown files that contain YAML frontmatter metadata, like this:
---
title: Something Somethingelse
author: Somebody Sometheson
---
But the YAML is of varying widths. Can I use a Posix command like sed to remove that frontmatter when it's at the beginning of a file? Something that just removes everything between --- and ---, inclusive, but also ignores the rest of the file, in case there are ---s elsewhere.
I understand your question to mean that you want to remove the first ----enclosed block if it starts at the first line. In that case,
sed '1 { /^---/ { :a N; /\n---/! ba; d} }' filename
This is:
1 { # in the first line
/^---/ { # if it starts with ---
:a # jump label for looping
N # fetch the next line, append to pattern space
/\n---/! ba; # if the result does not contain \n--- (that is, if the last
# fetched line does not begin with ---), go back to :a
d # then delete the whole thing.
}
}
# otherwise drop off the end here and do the default (print
# the line)
Depending on how you want to handle lines that begin with ---abc or so, you may have to change the patterns a little (perhaps add $ at the end to only match when the whole line is ---). I'm a bit unclear on your precise requirements there.
If you want to remove only the front matter, you could simply run:
sed '1{/^---$/!q;};1,/^---$/d' infile
If the first line doesn't match ---, sed will quit; else it will delete everything from the 1st line up to (and including) the next line matching --- (i.e. the entire front matter).
If you don't mind the "or something" being perl.
Simply print after two instances of "---" have been found:
perl -ne 'if ($i > 1) { print } else { /^---/ && $i++ }' yaml
or a bit shorter if you don't mind abusing ?: for flow control:
perl -ne '$i > 1 ? print : /^---/ && $i++' yaml
Be sure to include -i if you want to replace inline.
you use a bash file, create script.sh and make it executable using chmod +x script.sh and run it ./script.sh.
#!/bin/bash
#folder articles contains a lot of markdown files
files=./articles/*.md
for f in $files;
do
#filename
echo "${f##*/}"
#replace frontmatter title attribute to "title"
sed -i -r 's/^title: (.*)$/title: "\1"/' $f
#...
done
This AWK based solution works for files with and without FrontMatter, doing nothing in the later case.
#!/bin/sh
# Strips YAML FrontMattter from a file (usually Markdown).
# Exit immediately on each error and unset variable;
# see: https://vaneyckt.io/posts/safer_bash_scripts_with_set_euxo_pipefail/
set -Ee
print_help() {
echo "Strips YAML FrontMattter from a file (usually Markdown)."
echo
echo "Usage:"
echo " `basename $0` -h"
echo " `basename $0` --help"
echo " `basename $0` -i <file-with-front-matter>"
echo " `basename $0` --in-place <file-with-front-matter>"
echo " `basename $0` <file-with-front-matter> <file-to-be-without-front-matter>"
}
replace=false
in_file="-"
out_file="/dev/stdout"
if [ -n "$1" ]
then
if [ "$1" = "-h" ] || [ "$1" = "--help" ]
then
print_help
exit 0
elif [ "$1" = "-i" ] || [ "$1" = "--in-place" ]
then
replace=true
in_file="$2"
out_file="$in_file"
else
in_file="$1"
if [ -n "$2" ]
then
out_file="$2"
fi
fi
fi
tmp_out_file="$out_file"
if $replace
then
tmp_out_file="${in_file}_tmp"
fi
awk -e '
BEGIN {
is_first_line=1;
in_fm=0;
}
/^---$/ {
if (is_first_line) {
in_fm=1;
}
}
{
if (! in_fm) {
print $0;
}
}
/^(---|...)$/ {
if (! is_first_line) {
in_fm=0;
}
is_first_line=0;
}
' "$in_file" >> "$tmp_out_file"
if $replace
then
mv "$tmp_out_file" "$out_file"
fi

Any way to find if two adjacent new lines start with certain words?

Say I have a file like so:
+jaklfjdskalfjkdsaj
fkldsjafkljdkaljfsd
-jslakflkdsalfkdls;
+sdjafkdjsakfjdskal
I only want to find and count the amount of times during this file a line that starts with - is immediately followed by a line that starts with +.
Rules:
No external scripts
Must be done from within a bash script
Must be inline
I could figure out how to do this in a Python script, for instance, but I've never had to do something this extensive in Bash.
Could anyone help me out? I figure it'll end up being grep, perl, or maybe a talented sed line -- but these are things I'm still learning.
Thank you all!
grep -A1 "^-" $file | grep "^+" | wc -l
The first grep finds all of the lines starting with -, and the -A1 causes it to also output the line after the match too.
We then grep that output for any lines starting with +. Logically:
We know the output of the first grep is only the -XXX lines and the following lines
We know that a +xxx line cannot also be a -xxx line
Therefore, any +xxx lines must be following lines, and should be counted, which we do with wc -l
Easy in Perl:
perl -lne '$c++ if $p and /^\+/; $p = /^-/ }{ print $c' FILE
awk one-liner:
awk -v FS='' '{x=x sprintf("%s", $1)}END{print gsub(/-\+/,"",x)}' file
e.g.
kent$ cat file
+jaklfjdskalfjkdsaj
fkldsjafkljdkaljfsd
-jslakflkdsalfkdls;
+sdjafkdjsakfjdskal
-
-
-
+
-
+
foo
+
kent$ awk -v FS='' '{x=x sprintf("%s", $1)}END{print gsub(/-\+/,"",x)}' file
3
Another Perl example. Not as terse as choroba's, but more transparent in how it works:
perl -e'while (<>) { $last = $cur; $cur = $_; print $last, $cur if substr($last, 0, 1) eq "-" && substr($cur, 0, 1) eq "+" }' < infile
Output:
-jslakflkdsalfkdls;
+sdjafkdjsakfjdskal
Pure bash:
unset c p
while read line ; do
[[ $line == +* && $p == 0 ]] && (( c++ ))
[[ $line == -* ]]
p=$?
done < FILE
echo $c

How to delete multiple empty lines with SED?

I'm trying to compress a text document by deleting of duplicated empty lines, with sed. This is what I'm doing (to no avail):
sed -i -E 's/\n{3,}/\n/g' file.txt
I understand that it's not correct, according to this manual, but I can't figure out how to do it correctly. Thanks.
I think you want to replace spans of multiple blank lines with a single blank line, even though your example replaces multiple runs of \n with a single \n instead of \n\n. With that in mind, here are two solutions:
sed '/^$/{ :l
N; s/^\n$//; t l
p; d; }' input
In many implementations of sed, that can be all on one line, with the embedded newlines replaced by ;.
awk 't || !/^$/; { t = !/^$/ }'
As tripleee suggested above, I'm using Perl instead of sed:
perl -0777pi -e 's/\n{3,}/\n\n/g'
Use the translate function
tr -s '\n'
the -s or --squeeze-repeats reduces a sequence of repeated character to a single instance.
This is much better handled by tr -s '\n' or cat -s, but if you insist on sed, here's an example from section 4.17 of the GNU sed manual:
#!/usr/bin/sed -f
# on empty lines, join with next
# Note there is a star in the regexp
:x
/^\n*$/ {
N
bx
}
# now, squeeze all '\n', this can be also done by:
# s/^\(\n\)*/\1/
s/\n*/\
/
I am not sure this is what the OP wanted but using the awk solution by William Pursell here is the approach if you want to delete ALL empty lines in the file:
awk '!/^$/' file.txt
Explanation:
The awk pattern
'!/^$/'
is testing whether the current line is consisting only of the beginning of a line (symbolised by '^') and the end of a line (symbolised by '$'), in other words, whether the line is empty.
If this pattern is true awk applies its default and prints the current line.
HTH
I think OP wants to compress empty lines, e.g. where there are 9 consecutive emty lines, he wants to have just three.
I have written a little bash script that does just that:
#! /bin/bash
TOTALLINES="$(cat file.txt|wc -l)"
CURRENTLINE=1
while [ $CURRENTLINE -le $TOTALLINES ]
do
L1=$CURRENTLINE
L2=$(($L1 + 1))
L3=$(($L1 +2))
if [[ $(cat file.txt|head -$L1|tail +$L1) == "" ]]||[[ $(cat file.txt|head -$L1|tail +$L1) == " " ]]
then
L1EMPTY=true
else
L1EMPTY=false
fi
if [[ $(cat file.txt|head -$L2|tail +$L2) == "" ]]||[[ $(cat file.txt|head -$L2|tail +$L2) == " " ]]
then
L2EMPTY=true
else
L2EMPTY=false
fi
if [[ $(cat file.txt|head -$L3|tail +$L3) == "" ]]||[[ $(cat file.txt|head -$L3|tail +$L3) == " " ]]
then
L3EMPTY=true
else
L3EMPTY=false
fi
if [ $L1EMPTY = true ]&&[ $L2EMPTY = true ]&&[ $L3EMPTY = true ]
then
#do not cat line to temp file
echo "Skipping line "$CURRENTLINE
else
echo "$(cat file.txt|head -$CURRENTLINE|tail +$CURRENTLINE)">>temp.txt
echo "Writing line " $CURRENTLINE
fi
((CURRENTLINE++))
done
cat temp.txt>file.txt
rm -r temp.txt
FINALTOTALLINES="$(cat file.txt|wc -l)"
EMPTYLINELINT=$(( $CURRENTLINE - $FINALTOTALLINES ))
echo "Deleted " $EMPTYLINELINT " empty lines."