Subset a string in POSIX shell - sh

I have a variable set in the following format:
var1="word1 word2 word3"
Is it possible to subset/delete one of the space-delimited word portably? What I want to archive is something like this:
when --ignore option is supplied with the following argument
$ cmd --ignore word1 # case 1
$ cmd --ignore "word1 word2" # case2
I want the var1 changes to have only the following value
"word2 word3" # case1
"word3" #case2
If there is no way to achieve above described, is there a way to improve the efficiency of the following for loop? (The $var1 is in a for loop so my alternative thought to achieve similar was having following code)
# while loop to get argument from options
# argument of `--ignore` is assigned to `$sh_ignore`
for i in $var1
do
# check $i in $sh_ignore instead of other way around
# to avoid unmatch when $sh_ignore has more than 1 word
if ! echo "$sh_ignore" | grep "$i";
then
# normal actions
else
# skipped
fi
done
-------Update-------
After looking around and reading the comment by #chepner I now temporarily using following code (and am looking for improvement):
sh_ignore=''
while :; do
case
# some other option handling
--ignore)
if [ "$2" ]; then
sh_ignore=$2
shift
else
# defined `die` as print err msg + exit 1
die 'ERROR: "--ignore" requires a non-empty option argument.'
fi
;;
# handling if no arg is supplied to --ignore
# handling -- and unknown opt
esac
shift
done
if [ -n "$sh_ignore" ]; then
for d in $sh_ignore
do
var1="$(echo "$var1" | sed -e "s,$d,,")"
done
fi
# for loop with trimmed $var1 as downstream
for i in $var1
do
# normal actions
done

One method might be:
var1=$(echo "$var1" |
tr ' ' '\n' |
grep -Fxv -e "$(echo "$sh_ignore" | tr ' ' '\n')" |
tr '\n' ' ')
Note: this will leave a trailing blank, which can be trimmed off via var1=${var1% }

Related

Replacing characters in a sh script

I am writing an sh script and need to replace the . and - with a _
Current:
V123_45_678_910.11_1213-1415.sh
Wanted:
V123_45_678_910_11_1213_1415.sh
I have used a few mv commands, but I am having trouble.
for file in /virtualun/rest/scripts/IOL_Extra/*.sh ; do mv $file ${file//V15_IOL_NVMe_01./V15_IOL_NVMe_01_} ; done
You don't need to match any of the other parts of the file name, just the characters you want to replace. To avoid turning foo.sh into foo-sh, remove the extension first, then add it back to the result of the replacement.
for file in /virtualun/rest/scripts/IOL_Extra/*.sh ; do
base=${file%.sh}
mv -i -- "$file" "${base//[-.]/_}".sh
done
Use the -i option to make sure you don't inadvertently replace one file with another when the modified names coincide.
This should work:
#!/usr/bin/env sh
# Fail on error
set -o errexit
# Disable undefined variable reference
set -o nounset
# Enable wildcard character expansion
set +o noglob
# ================
# CONFIGURATION
# ================
# Pattern
PATTERN="/virtualun/rest/scripts/IOL_Extra/*.sh"
# ================
# LOGGER
# ================
# Fatal log message
fatal() {
printf '[FATAL] %s\n' "$#" >&2
exit 1
}
# Info log message
info() {
printf '[INFO ] %s\n' "$#"
}
# ================
# MAIN
# ================
{
# Check directory exists
[ -d "$(dirname "$PATTERN")" ] || fatal "Directory '$PATTERN' does not exists"
for _file in $PATTERN; do
# Skip if not file
[ -f "$_file" ] || continue
info "Analyzing file '$_file'"
# File data
_file_dirname=$(dirname -- "$_file")
_file_basename=$(basename -- "$_file")
_file_name="${_file_basename%.*}"
_file_extension=
case $_file_basename in
*.*) _file_extension=".${_file_basename##*.}" ;;
esac
# New file name
_new_file_name=$(printf '%s\n' "$_file_name" | sed 's/[\.\-][\.\-]*/_/g')
# Skip if equals
[ "$_file_name" != "$_new_file_name" ] || continue
# New file
_new_file="$_file_dirname/${_new_file_name}${_file_extension}"
# Rename
info "Renaming file '$_file' to '$_new_file'"
mv -i -- "$_file" "$_new_file"
done
}
You can try this:
for f in /virtualun/rest/scripts/IOL_Extra/*.sh; do
mv "$f" $(sed 's/[.-]/_/g' <<< "$f")
done
The sed command is replacing all characters .- by _.
I prefer using sed substitute as posted by oliv.
However, if you have not familiar with regular expression, using rename is faster/easier to understand:
Example:
$ touch V123_45_678_910.11_1213-1415.sh
$ rename -va '.' '_' *sh
`V123_45_678_910.11_1213-1415.sh' -> `V123_45_678_910_11_1213-1415_sh'
$ rename -va '-' '_' *sh
`V123_45_678_910_11_1213-1415_sh' -> `V123_45_678_910_11_1213_1415_sh'
$ rename -vl '_sh' '.sh' *sh
`V123_45_678_910_11_1213_1415_sh' -> V123_45_678_910_11_1213_1415.sh'
$ ls *sh
V123_45_678_910_11_1213_1415.sh
Options explained:
-v prints the name of the file before -> after the operation
-a replaces all occurrences of the first argument with the second argument
-l replaces the last occurrence of the first argument with the second argument
Note that this might not be suitable depending on the other files you have in the given directory that would match *sh and that you do NOT want to rename.

posix sh: how to count number of occurrences in a string without using external tools?

In bash, it can be done like this:
#!/bin/bash
query='bengal'
string_to_search='bengal,toyger,bengal,persian,bengal'
delimiter='|'
replace_queries="${string_to_search//"$query"/"$delimiter"}"
delimiter_count="${replace_queries//[^"$delimiter"]}"
delimiter_count="${#delimiter_count}"
echo "Found $delimiter_count occurences of \"$query\""
Output:
Found 3 occurences of "bengal"
The caveat of course is that the delimiter cannot occur in 'query' or 'string_to_search'.
In POSIX sh, string replacement is not supported. Is there a way this can be done in POSIX sh using only shell builtins?
#!/bin/sh
query='bengal'
string_to_search='bengal,toyger,bengal,persian,bengal'
ct() (
n=0
IFS=,
q=$1
set $2
for t in "$#"; do
if [ "$t" = "$q" ]; then
n=$((n + 1))
fi
done
echo $n
)
n=$(ct "$query" "$string_to_search")
printf "found %d %s\n" $n $query
Though I'm not sure what the point is. If you've got a posix shell,
you also almost certainly have printf, sed, grep, and wc.
printf '%s\n' "$string_to_search" | sed -e 's/,/\n/g' | grep -Fx "$query" | wc -l
Think I got it...
#!/bin/sh
query='bengal'
string_to_search='bengal,toyger,bengal,persian,bengal'
i=0
process_string="$string_to_search"
while [ -n "$process_string" ]; do
case "$process_string" in
*"$query"*)
process_string="${process_string#*"$query"}"
i="$(( i + 1 ))"
;;
*)
break
;;
esac
done
echo "Found $i occurences of \"$query\""

How to remove YAML frontmatter from markdown files?

I have markdown files that contain YAML frontmatter metadata, like this:
---
title: Something Somethingelse
author: Somebody Sometheson
---
But the YAML is of varying widths. Can I use a Posix command like sed to remove that frontmatter when it's at the beginning of a file? Something that just removes everything between --- and ---, inclusive, but also ignores the rest of the file, in case there are ---s elsewhere.
I understand your question to mean that you want to remove the first ----enclosed block if it starts at the first line. In that case,
sed '1 { /^---/ { :a N; /\n---/! ba; d} }' filename
This is:
1 { # in the first line
/^---/ { # if it starts with ---
:a # jump label for looping
N # fetch the next line, append to pattern space
/\n---/! ba; # if the result does not contain \n--- (that is, if the last
# fetched line does not begin with ---), go back to :a
d # then delete the whole thing.
}
}
# otherwise drop off the end here and do the default (print
# the line)
Depending on how you want to handle lines that begin with ---abc or so, you may have to change the patterns a little (perhaps add $ at the end to only match when the whole line is ---). I'm a bit unclear on your precise requirements there.
If you want to remove only the front matter, you could simply run:
sed '1{/^---$/!q;};1,/^---$/d' infile
If the first line doesn't match ---, sed will quit; else it will delete everything from the 1st line up to (and including) the next line matching --- (i.e. the entire front matter).
If you don't mind the "or something" being perl.
Simply print after two instances of "---" have been found:
perl -ne 'if ($i > 1) { print } else { /^---/ && $i++ }' yaml
or a bit shorter if you don't mind abusing ?: for flow control:
perl -ne '$i > 1 ? print : /^---/ && $i++' yaml
Be sure to include -i if you want to replace inline.
you use a bash file, create script.sh and make it executable using chmod +x script.sh and run it ./script.sh.
#!/bin/bash
#folder articles contains a lot of markdown files
files=./articles/*.md
for f in $files;
do
#filename
echo "${f##*/}"
#replace frontmatter title attribute to "title"
sed -i -r 's/^title: (.*)$/title: "\1"/' $f
#...
done
This AWK based solution works for files with and without FrontMatter, doing nothing in the later case.
#!/bin/sh
# Strips YAML FrontMattter from a file (usually Markdown).
# Exit immediately on each error and unset variable;
# see: https://vaneyckt.io/posts/safer_bash_scripts_with_set_euxo_pipefail/
set -Ee
print_help() {
echo "Strips YAML FrontMattter from a file (usually Markdown)."
echo
echo "Usage:"
echo " `basename $0` -h"
echo " `basename $0` --help"
echo " `basename $0` -i <file-with-front-matter>"
echo " `basename $0` --in-place <file-with-front-matter>"
echo " `basename $0` <file-with-front-matter> <file-to-be-without-front-matter>"
}
replace=false
in_file="-"
out_file="/dev/stdout"
if [ -n "$1" ]
then
if [ "$1" = "-h" ] || [ "$1" = "--help" ]
then
print_help
exit 0
elif [ "$1" = "-i" ] || [ "$1" = "--in-place" ]
then
replace=true
in_file="$2"
out_file="$in_file"
else
in_file="$1"
if [ -n "$2" ]
then
out_file="$2"
fi
fi
fi
tmp_out_file="$out_file"
if $replace
then
tmp_out_file="${in_file}_tmp"
fi
awk -e '
BEGIN {
is_first_line=1;
in_fm=0;
}
/^---$/ {
if (is_first_line) {
in_fm=1;
}
}
{
if (! in_fm) {
print $0;
}
}
/^(---|...)$/ {
if (! is_first_line) {
in_fm=0;
}
is_first_line=0;
}
' "$in_file" >> "$tmp_out_file"
if $replace
then
mv "$tmp_out_file" "$out_file"
fi

Finding multiple strings on multiple lines in file and manipulating output with bash/perl

I am trying to get the version numbers for content management systems being hosted on my server. I can do this fairly simply if the version number is stored on one line with something like this:
grep -r "\$wp_version = '" /home/
Which returns exactly what I want to stdout:
/home/$RANDOMDOMAIN/wp-includes/version.php:$wp_version = '3.7.1';
The issue I run into is when I start looking for version numbers that are stored on two or more lines, like Joomla! or Magento which use the following formats respectively:
Joomla!:
/** #var string Release version. */
public $RELEASE = '3.2';
/** #var string Maintenance version. */
public $DEV_LEVEL = '3';
Magento:
'major' => '1',
'minor' => '8',
'revision' => '1',
'patch' => '0',
I have gotten it to 'work', in a way, using the following (With this method if, for whatever reason, one of the strings I am looking for is missing the whole command becomes useless since xargs -l3 is expecting 2 rows above the path provided by -print):
find /home/ -type f -name version.php -exec grep " \$RELEASE " '{}' \; -exec grep " \$DEV_LEVEL " '{}' \; -print | xargs -l3 | sed 's/\<var\>\s//g;s/\<public\>\s//g' | awk -F\; '{print $3":"$1""$2}' | sed 's/ $DEV_LEVEL = /./g'
Which get's me output like this:
/home/$RANDOMDOMAIN/version.php:$RELEASE = 3.2.3
/home/$RANDOMDOMAIN/anotherfolder/version.php:$RELEASE = 1.5.0
I also have a working for loop that WILL exclude any file that does not contain both strings, but depending how much it has to sift through, can take significantly longer than the find one liner above:
for path in $(grep -rl " \$RELEASE " /home/ 2> /dev/null | xargs grep -rl " \$DEV_LEVEL ")
do
joomlaver="$path"
joomlaver+=$(grep " \$RELEASE " $path)
joomlaver+=$(echo " \$DEV_LEVEL = '$(grep " \$DEV_LEVEL " $path | cut -d\' -f2)';")
echo "$joomlaver" | sed 's/\<var\>\s//g;s/\<public\>\s//g;s/;//g' | awk -F\' '{ print $1""$2"."$4 }' | sed 's/\s\+//g'
unset joomlaver
done
Which get's me output like this:
/home/$RANDOMDOMAIN/version.php$RELEASE=3.2.3
/home/$RANDOMDOMAIN/anotherfolder/version.php$RELEASE=1.5.0
But I have to believe there is a simpler, shorter, more elegant way. Bash is preferred or if it can somehow be done with a perl one liner, that would work as well. Any and all help would be much appreciated. Thanks in advance. (Sorry for all the edits, but I am trying to figure this out myself as well!)
Here is a perl one-liner that will extract the $RELEASE and $DEV_LEVEL from the php file format you showed:
perl -ne '$v=$1 if /\$RELEASE\s*=\s*\047([0-9.]+)\047/; $devlevel=$1 if /\$DEV_LEVEL\s*=\s*\047([0-9.]+)\047/; if (defined $v && defined $devlevel) { print "$ARGV: Release=$v Devlevel=$devlevel\n"; last; }'
The -n makes perl effectivly wrap the whole thing inside a while (<>) { } loop. Each line is checked against two regexes. If both of them have matched then it will print the result and exit.
The \047 is used to match single quotes, otherwise the shell would get confused.
If it does not find a match, it does not print anything. Otherwise it prints something like this:
sample.php: Release=3.2 Devlevel=3
You would use it in combination with find and xargs to traverse down a directory structure, perhaps like this:
find . -name "*.php" | xargs perl -ne '$v=$1 if /\$RELEASE\s*=\s*\047([0-9.]+)\047/; $devlevel=$1 if /\$DEV_LEVEL\s*=\s*\047([0-9.]+)\047/; if (defined $v && defined $devlevel) { print "$ARGV: Release=$v Devlevel=$devlevel\n"; last; }'
You could make a similar version for the other file format (Magento?) you mentioned.

Any way to find if two adjacent new lines start with certain words?

Say I have a file like so:
+jaklfjdskalfjkdsaj
fkldsjafkljdkaljfsd
-jslakflkdsalfkdls;
+sdjafkdjsakfjdskal
I only want to find and count the amount of times during this file a line that starts with - is immediately followed by a line that starts with +.
Rules:
No external scripts
Must be done from within a bash script
Must be inline
I could figure out how to do this in a Python script, for instance, but I've never had to do something this extensive in Bash.
Could anyone help me out? I figure it'll end up being grep, perl, or maybe a talented sed line -- but these are things I'm still learning.
Thank you all!
grep -A1 "^-" $file | grep "^+" | wc -l
The first grep finds all of the lines starting with -, and the -A1 causes it to also output the line after the match too.
We then grep that output for any lines starting with +. Logically:
We know the output of the first grep is only the -XXX lines and the following lines
We know that a +xxx line cannot also be a -xxx line
Therefore, any +xxx lines must be following lines, and should be counted, which we do with wc -l
Easy in Perl:
perl -lne '$c++ if $p and /^\+/; $p = /^-/ }{ print $c' FILE
awk one-liner:
awk -v FS='' '{x=x sprintf("%s", $1)}END{print gsub(/-\+/,"",x)}' file
e.g.
kent$ cat file
+jaklfjdskalfjkdsaj
fkldsjafkljdkaljfsd
-jslakflkdsalfkdls;
+sdjafkdjsakfjdskal
-
-
-
+
-
+
foo
+
kent$ awk -v FS='' '{x=x sprintf("%s", $1)}END{print gsub(/-\+/,"",x)}' file
3
Another Perl example. Not as terse as choroba's, but more transparent in how it works:
perl -e'while (<>) { $last = $cur; $cur = $_; print $last, $cur if substr($last, 0, 1) eq "-" && substr($cur, 0, 1) eq "+" }' < infile
Output:
-jslakflkdsalfkdls;
+sdjafkdjsakfjdskal
Pure bash:
unset c p
while read line ; do
[[ $line == +* && $p == 0 ]] && (( c++ ))
[[ $line == -* ]]
p=$?
done < FILE
echo $c