Iterate over $# stored in another variable in another function - sh

How can I iterate over $# after it has been stored in another variable in another function?
Note this is about the sh shell, not bash.
My code (super simplified):
#! /bin/sh
set -- a b "c d"
args=
argv() {
shift # pretend handling options
args="$#" # remaining arguments
}
fun() {
for arg in "$args"; do
echo "+$arg+"
done
}
argv "$#"
fun
Output:
+b c d+
I want:
+b+
+c d+
The special variable $# stores argv preserving whitespace. The for loop can loop over $# also preserving whitespace.
set -- a b "c d"
for arg in "$#"; do
echo "+$arg+"
done
Output:
+a+
+b+
+c d+
But once $# is assigned to another variable the whitespace preserving is gone.
set -- a b "c d"
args="$#"
for arg in "$args"; do
echo "+$arg+"
done
Output
+a b c d+
Without quotes:
for arg in $args; do
echo "+$arg+"
done
Output:
+a+
+b+
+c+
+d+
In bash it can be done using arrays.
set -- a b "c d"
args=("$#")
for arg in "${args[#]}"; do
echo "+$arg+"
done
Output:
+a+
+b+
+c d+
Can that be done in the sh shell?

You could use shift again inside fun if you know the shift has been performed in argv.
#! /bin/sh
set -- a b "c d"
args=
argv() {
shifted=1 # pretend handling options
shift $shifted
}
fun() {
[ -n $shifted ] && shift $shifted
for arg; do
echo "+$arg+"
done
}
argv "$#"
fun "$#"
Output:
+b+
+c d+

Here are two workarounds. Both have caveats.
First workaround: put newlines between arguments then use read.
set -- a b " c d "
args=
argv() {
shift
for arg in "$#"; do
args="$args$arg\n"
done
}
fun() {
printf "$args" | while IFS= read -r arg; do
echo "+$arg+"
done
}
argv "$#"
fun
Output:
+b+
+ c d +
Note that even the spaces before and after are preserved.
Caveat: if the arguments contain newlines you are screwed.
Second workaround: put quotes around arguments then use eval.
set -- a b " c d "
args=
argv() {
shift
for arg in "$#"; do
args="$args \"$arg\""
done
}
fun() {
for arg in "$#"; do
echo "+$arg+"
done
}
argv "$#"
eval fun "$args"
Caveat: if the arguments contain quotes you are screwed.

Related

Subset a string in POSIX shell

I have a variable set in the following format:
var1="word1 word2 word3"
Is it possible to subset/delete one of the space-delimited word portably? What I want to archive is something like this:
when --ignore option is supplied with the following argument
$ cmd --ignore word1 # case 1
$ cmd --ignore "word1 word2" # case2
I want the var1 changes to have only the following value
"word2 word3" # case1
"word3" #case2
If there is no way to achieve above described, is there a way to improve the efficiency of the following for loop? (The $var1 is in a for loop so my alternative thought to achieve similar was having following code)
# while loop to get argument from options
# argument of `--ignore` is assigned to `$sh_ignore`
for i in $var1
do
# check $i in $sh_ignore instead of other way around
# to avoid unmatch when $sh_ignore has more than 1 word
if ! echo "$sh_ignore" | grep "$i";
then
# normal actions
else
# skipped
fi
done
-------Update-------
After looking around and reading the comment by #chepner I now temporarily using following code (and am looking for improvement):
sh_ignore=''
while :; do
case
# some other option handling
--ignore)
if [ "$2" ]; then
sh_ignore=$2
shift
else
# defined `die` as print err msg + exit 1
die 'ERROR: "--ignore" requires a non-empty option argument.'
fi
;;
# handling if no arg is supplied to --ignore
# handling -- and unknown opt
esac
shift
done
if [ -n "$sh_ignore" ]; then
for d in $sh_ignore
do
var1="$(echo "$var1" | sed -e "s,$d,,")"
done
fi
# for loop with trimmed $var1 as downstream
for i in $var1
do
# normal actions
done
One method might be:
var1=$(echo "$var1" |
tr ' ' '\n' |
grep -Fxv -e "$(echo "$sh_ignore" | tr ' ' '\n')" |
tr '\n' ' ')
Note: this will leave a trailing blank, which can be trimmed off via var1=${var1% }

Separate command-line arguments into two lists and pass to programs (shell)

What I want to do is take a list of command-like arguments like abc "def ghi" "foo bar" baz (note that some arguments are quoted because they contain spaces), and separate them out into two lists of arguments which then get passed to other programs that are invoked by the script. For example, odd-numbered arguments to one program and even-numbered arguments to another program. It is important to preserve proper quoting.
Please note, I need a solution in pure Bourne Shell script (i.e., sh not bash or such). The way I'd do this in Bash would be to use arrays, but of course the Bourne Shell doesn't have support for arrays.
At the cost of iterating over the original arguments twice, you can define a function that can run a simple command using only the even or odd arguments. This allows us to use the function's arguments as an additional array.
# Usage:
# run_it <cmd> [even|odd] ...
#
# Runs <cmd> using only the even or odd arguments, as specified.
run_it () {
cmd=${1:?Missing command name}
parity=${2:?Missing parity}
shift 2
n=$#
# Collect the odd arguments by discarding the first
# one, turning the odd arguments into the even arguments.
if [ $# -ge 1 ] && [ $parity = odd ]; then
shift
n=$((n - 1))
fi
# Repeatedly move the first argument to the
# to the end of the list and discard the second argument.
# Keep going until you have moved or discarded each argument.
while [ "$n" -gt 0 ]; do
x=$1
if [ $n -ge 2 ]; then
shift 2
else
shift
fi
set -- "$#" "$x"
n=$((n-2))
done
# Run the given command with the arguments that are left.
"$cmd" "$#"
}
# Example command
cmd () {
printf '%s\n' "$#"
}
# Example of using run_it
run_it cmd even "$#"
run_it cmd odd "$#"
This might be what you need. Alas, it uses eval. YMMV.
#!/bin/sh
# Samples
foo() { showme foo "$#"; }
bar() { showme bar "$#"; }
showme() {
echo "$1 args:"
shift
local c=0
while [ $# -gt 0 ]; do
printf '\t%-3d %s\n' $((c=c+1)) "$1"
shift
done
}
while [ $# -gt 0 ]; do
foo="$foo \"$1\""
bar="$bar \"$2\""
shift 2
done
eval foo $foo
eval bar $bar
There's no magic here -- we simply encode alternating arguments with quote armour into variables so they'll be processed correctly when you eval the line.
I tested this with FreeBSD's /bin/sh, which is based on ash. The shell is close to POSIX.1 but is not necessarily "Bourne". If your shell doesn't accept arguments to shift, you can simply shift twice in the while loop. Similarly, the showme() function increments a counter, an action which can be achieved in whatever way is your favourite if mine doesn't work for you. I believe everything else is pretty standard.

posix sh: how to count number of occurrences in a string without using external tools?

In bash, it can be done like this:
#!/bin/bash
query='bengal'
string_to_search='bengal,toyger,bengal,persian,bengal'
delimiter='|'
replace_queries="${string_to_search//"$query"/"$delimiter"}"
delimiter_count="${replace_queries//[^"$delimiter"]}"
delimiter_count="${#delimiter_count}"
echo "Found $delimiter_count occurences of \"$query\""
Output:
Found 3 occurences of "bengal"
The caveat of course is that the delimiter cannot occur in 'query' or 'string_to_search'.
In POSIX sh, string replacement is not supported. Is there a way this can be done in POSIX sh using only shell builtins?
#!/bin/sh
query='bengal'
string_to_search='bengal,toyger,bengal,persian,bengal'
ct() (
n=0
IFS=,
q=$1
set $2
for t in "$#"; do
if [ "$t" = "$q" ]; then
n=$((n + 1))
fi
done
echo $n
)
n=$(ct "$query" "$string_to_search")
printf "found %d %s\n" $n $query
Though I'm not sure what the point is. If you've got a posix shell,
you also almost certainly have printf, sed, grep, and wc.
printf '%s\n' "$string_to_search" | sed -e 's/,/\n/g' | grep -Fx "$query" | wc -l
Think I got it...
#!/bin/sh
query='bengal'
string_to_search='bengal,toyger,bengal,persian,bengal'
i=0
process_string="$string_to_search"
while [ -n "$process_string" ]; do
case "$process_string" in
*"$query"*)
process_string="${process_string#*"$query"}"
i="$(( i + 1 ))"
;;
*)
break
;;
esac
done
echo "Found $i occurences of \"$query\""

Merging newline separated strings

Let's say I have two variables foo and bar containing the same number of newline separated strings, for instance
$ echo $foo
a
b
c
$ echo $bar
x
y
z
What is the simplest way to merge foo and bar to get the output below?
a x
b y
c z
If foo and bar were files I could do paste -d ' ' foo bar but in this case they are strings.
You can use process substitution in Bash to do this (not POSIX compliant):
foo=$'a\nb\nc'
bar=$'x\ny\nz'
paste -d ' ' <(printf '%s\n' "$foo") <(printf '%s\n' "$bar")
Outputs:
a x
b y
c z
An sh-compliant way seems a little convoluted:
foo=$'a\nb\nc'
bar=$'x\ny\nz'
res=$(while IFS=$'\n' read -u 3 -r f1 && IFS=$'\n' read -u 4 -r f2; do
printf '%s' "$f1"
printf ' %s\n' "$f2"
done 3<<<"$foo" 4<<<"$bar"
)

sed: replace spaces within quotes with underscores

I have input (for example, from ifconfig run0 scan on OpenBSD) that has some fields that are separated by spaces, but some of the fields themselves contain spaces (luckily, such fields that contain spaces are always enclosed in quotes).
I need to distinguish between the spaces within the quotes, and the separator spaces. The idea is to replace spaces within quotes with underscores.
Sample data:
%cat /tmp/ifconfig_scan | fgrep nwid | cut -f3
nwid Websense chan 6 bssid 00:22:7f:xx:xx:xx 59dB 54M short_preamble,short_slottime
nwid ZyXEL chan 8 bssid cc:5d:4e:xx:xx:xx 5dB 54M privacy,short_slottime
nwid "myTouch 4G Hotspot" chan 11 bssid d8:b3:77:xx:xx:xx 49dB 54M privacy,short_slottime
Which doesn't end up processed the way I want, since I haven't replaced the spaces within the quotes with the underscores yet:
%cat /tmp/ifconfig_scan | fgrep nwid | cut -f3 |\
cut -s -d ' ' -f 2,4,6,7,8 | sort -n -k4
"myTouch Hotspot" 11 bssid d8:b3:77:xx:xx:xx
ZyXEL 8 cc:5d:4e:xx:xx:xx 5dB 54M
Websense 6 00:22:7f:xx:xx:xx 59dB 54M
For a sed-only solution (which I don't necessarily advocate), try:
echo 'a b "c d e" f g "h i"' |\
sed ':a;s/^\(\([^"]*"[^"]*"[^"]*\)*[^"]*"[^"]*\) /\1_/;ta'
a b "c_d_e" f g "h_i"
Translation:
Start at the beginning of the line.
Look for the pattern junk"junk", repeated zero or more times, where junk doesn't have a quote, followed by junk"junk space.
Replace the final space with _.
If successful, jump back to the beginning.
try this:
awk -F'"' '{for(i=2;i<=NF;i++)if(i%2==0)gsub(" ","_",$i);}1' OFS="\"" file
it works for multi quotation parts in a line:
echo '"first part" foo "2nd part" bar "the 3rd part comes" baz'| awk -F'"' '{for(i=2;i<=NF;i++)if(i%2==0)gsub(" ","_",$i);}1' OFS="\""
"first_part" foo "2nd_part" bar "the_3rd_part_comes" baz
EDIT alternative form:
awk 'BEGIN{FS=OFS="\""} {for(i=2;i<NF;i+=2)gsub(" ","_",$i)} 1' file
Another awk to try:
awk '!(NR%2){gsub(FS,"_")}1' RS=\" ORS=\"
Removing the quotes:
awk '!(NR%2){gsub(FS,"_")}1' RS=\" ORS=
Some additional testing with a triple size test file further to the earlier tests done by #steve. I had to transform the sed statement a little bit so that non-GNU seds could process it as well. I included awk (bwk) gawk3, gawk4 and mawk:
$ for i in {1..1500000}; do echo 'a b "c d e" f g "h i" j k l "m n o "p q r" s t" u v "w x" y z' ; done > test
$ time perl -pe 's:"[^"]*":($x=$&)=~s/ /_/g;$x:ge' test >/dev/null
real 0m27.802s
user 0m27.588s
sys 0m0.177s
$ time awk 'BEGIN{FS=OFS="\""} {for(i=2;i<NF;i+=2)gsub(" ","_",$i)} 1' test >/dev/null
real 0m6.565s
user 0m6.500s
sys 0m0.059s
$ time gawk3 'BEGIN{FS=OFS="\""} {for(i=2;i<NF;i+=2)gsub(" ","_",$i)} 1' test >/dev/null
real 0m21.486s
user 0m18.326s
sys 0m2.658s
$ time gawk4 'BEGIN{FS=OFS="\""} {for(i=2;i<NF;i+=2)gsub(" ","_",$i)} 1' test >/dev/null
real 0m14.270s
user 0m14.173s
sys 0m0.083s
$ time mawk 'BEGIN{FS=OFS="\""} {for(i=2;i<NF;i+=2)gsub(" ","_",$i)} 1' test >/dev/null
real 0m4.251s
user 0m4.193s
sys 0m0.053s
$ time awk '!(NR%2){gsub(FS,"_")}1' RS=\" ORS=\" test >/dev/null
real 0m13.229s
user 0m13.141s
sys 0m0.075s
$ time gawk3 '!(NR%2){gsub(FS,"_")}1' RS=\" ORS=\" test >/dev/null
real 0m33.965s
user 0m26.822s
sys 0m7.108s
$ time gawk4 '!(NR%2){gsub(FS,"_")}1' RS=\" ORS=\" test >/dev/null
real 0m15.437s
user 0m15.328s
sys 0m0.087s
$ time mawk '!(NR%2){gsub(FS,"_")}1' RS=\" ORS=\" test >/dev/null
real 0m4.002s
user 0m3.948s
sys 0m0.051s
$ time sed -e :a -e 's/^\(\([^"]*"[^"]*"[^"]*\)*[^"]*"[^"]*\) /\1_/;ta' test > /dev/null
real 5m14.008s
user 5m13.082s
sys 0m0.580s
$ time gsed -e :a -e 's/^\(\([^"]*"[^"]*"[^"]*\)*[^"]*"[^"]*\) /\1_/;ta' test > /dev/null
real 4m11.026s
user 4m10.318s
sys 0m0.463s
mawk rendered the fastest results...
You'd be better off with perl. The code is much more readable and maintainable:
perl -pe 's:"[^"]*":($x=$&)=~s/ /_/g;$x:ge'
With your input, the results are:
a b "c_d_e" f g "h_i"
Explanation:
-p # enable printing
-e # the following expression...
s # begin a substitution
: # the first substitution delimiter
"[^"]*" # match a double quote followed by anything not a double quote any
# number of times followed by a double quote
: # the second substitution delimiter
($x=$&)=~s/ /_/g; # copy the pattern match ($&) into a variable ($x), then
# substitute a space for an underscore globally on $x. The
# variable $x is needed because capture groups and
# patterns are read only variables.
$x # return $x as the replacement.
: # the last delimiter
g # perform the nested substitution globally
e # make sure that the replacement is handled as an expression
Some testing:
for i in {1..500000}; do echo 'a b "c d e" f g "h i" j k l "m n o "p q r" s t" u v "w x" y z' >> test; done
time perl -pe 's:"[^"]*":($x=$&)=~s/ /_/g;$x:ge' test >/dev/null
real 0m8.301s
user 0m8.273s
sys 0m0.020s
time awk 'BEGIN{FS=OFS="\""} {for(i=2;i<NF;i+=2)gsub(" ","_",$i)} 1' test >/dev/null
real 0m4.967s
user 0m4.924s
sys 0m0.036s
time awk '!(NR%2){gsub(FS,"_")}1' RS=\" ORS=\" test >/dev/null
real 0m4.336s
user 0m4.244s
sys 0m0.056s
time sed ':a;s/^\(\([^"]*"[^"]*"[^"]*\)*[^"]*"[^"]*\) /\1_/;ta' test >/dev/null
real 2m26.101s
user 2m25.925s
sys 0m0.100s
NOT AN ANSWER, just posting awk equivalent code for #steve's perl code in case anyone's interested (and to help me remember this in future):
#steve posted:
perl -pe 's:"[^\"]*":($x=$&)=~s/ /_/g;$x:ge'
and from reading #steve's explanation the briefest awk equivalent to that perl code (NOT the preferred awk solution - see #Kent's answer for that) would be the GNU awk:
gawk '{
head = ""
while ( match($0,"\"[^\"]*\"") ) {
head = head substr($0,1,RSTART-1) gensub(/ /,"_","g",substr($0,RSTART,RLENGTH))
$0 = substr($0,RSTART+RLENGTH)
}
print head $0
}'
which we get to by starting from a POSIX awk solution with more variables:
awk '{
head = ""
tail = $0
while ( match(tail,"\"[^\"]*\"") ) {
x = substr(tail,RSTART,RLENGTH)
gsub(/ /,"_",x)
head = head substr(tail,1,RSTART-1) x
tail = substr(tail,RSTART+RLENGTH)
}
print head tail
}'
and saving a line with GNU awk's gensub():
gawk '{
head = ""
tail = $0
while ( match(tail,"\"[^\"]*\"") ) {
x = gensub(/ /,"_","g",substr(tail,RSTART,RLENGTH))
head = head substr(tail,1,RSTART-1) x
tail = substr(tail,RSTART+RLENGTH)
}
print head tail
}'
and then getting rid of the variable x:
gawk '{
head = ""
tail = $0
while ( match(tail,"\"[^\"]*\"") ) {
head = head substr(tail,1,RSTART-1) gensub(/ /,"_","g",substr(tail,RSTART,RLENGTH))
tail = substr(tail,RSTART+RLENGTH)
}
print head tail
}'
and then getting rid of the variable "tail" if you don't need $0, NF, etc, left hanging around after the loop:
gawk '{
head = ""
while ( match($0,"\"[^\"]*\"") ) {
head = head substr($0,1,RSTART-1) gensub(/ /,"_","g",substr($0,RSTART,RLENGTH))
$0 = substr($0,RSTART+RLENGTH)
}
print head $0
}'