In bash, it can be done like this:
#!/bin/bash
query='bengal'
string_to_search='bengal,toyger,bengal,persian,bengal'
delimiter='|'
replace_queries="${string_to_search//"$query"/"$delimiter"}"
delimiter_count="${replace_queries//[^"$delimiter"]}"
delimiter_count="${#delimiter_count}"
echo "Found $delimiter_count occurences of \"$query\""
Output:
Found 3 occurences of "bengal"
The caveat of course is that the delimiter cannot occur in 'query' or 'string_to_search'.
In POSIX sh, string replacement is not supported. Is there a way this can be done in POSIX sh using only shell builtins?
#!/bin/sh
query='bengal'
string_to_search='bengal,toyger,bengal,persian,bengal'
ct() (
n=0
IFS=,
q=$1
set $2
for t in "$#"; do
if [ "$t" = "$q" ]; then
n=$((n + 1))
fi
done
echo $n
)
n=$(ct "$query" "$string_to_search")
printf "found %d %s\n" $n $query
Though I'm not sure what the point is. If you've got a posix shell,
you also almost certainly have printf, sed, grep, and wc.
printf '%s\n' "$string_to_search" | sed -e 's/,/\n/g' | grep -Fx "$query" | wc -l
Think I got it...
#!/bin/sh
query='bengal'
string_to_search='bengal,toyger,bengal,persian,bengal'
i=0
process_string="$string_to_search"
while [ -n "$process_string" ]; do
case "$process_string" in
*"$query"*)
process_string="${process_string#*"$query"}"
i="$(( i + 1 ))"
;;
*)
break
;;
esac
done
echo "Found $i occurences of \"$query\""
I have markdown files that contain YAML frontmatter metadata, like this:
---
title: Something Somethingelse
author: Somebody Sometheson
---
But the YAML is of varying widths. Can I use a Posix command like sed to remove that frontmatter when it's at the beginning of a file? Something that just removes everything between --- and ---, inclusive, but also ignores the rest of the file, in case there are ---s elsewhere.
I understand your question to mean that you want to remove the first ----enclosed block if it starts at the first line. In that case,
sed '1 { /^---/ { :a N; /\n---/! ba; d} }' filename
This is:
1 { # in the first line
/^---/ { # if it starts with ---
:a # jump label for looping
N # fetch the next line, append to pattern space
/\n---/! ba; # if the result does not contain \n--- (that is, if the last
# fetched line does not begin with ---), go back to :a
d # then delete the whole thing.
}
}
# otherwise drop off the end here and do the default (print
# the line)
Depending on how you want to handle lines that begin with ---abc or so, you may have to change the patterns a little (perhaps add $ at the end to only match when the whole line is ---). I'm a bit unclear on your precise requirements there.
If you want to remove only the front matter, you could simply run:
sed '1{/^---$/!q;};1,/^---$/d' infile
If the first line doesn't match ---, sed will quit; else it will delete everything from the 1st line up to (and including) the next line matching --- (i.e. the entire front matter).
If you don't mind the "or something" being perl.
Simply print after two instances of "---" have been found:
perl -ne 'if ($i > 1) { print } else { /^---/ && $i++ }' yaml
or a bit shorter if you don't mind abusing ?: for flow control:
perl -ne '$i > 1 ? print : /^---/ && $i++' yaml
Be sure to include -i if you want to replace inline.
you use a bash file, create script.sh and make it executable using chmod +x script.sh and run it ./script.sh.
#!/bin/bash
#folder articles contains a lot of markdown files
files=./articles/*.md
for f in $files;
do
#filename
echo "${f##*/}"
#replace frontmatter title attribute to "title"
sed -i -r 's/^title: (.*)$/title: "\1"/' $f
#...
done
This AWK based solution works for files with and without FrontMatter, doing nothing in the later case.
#!/bin/sh
# Strips YAML FrontMattter from a file (usually Markdown).
# Exit immediately on each error and unset variable;
# see: https://vaneyckt.io/posts/safer_bash_scripts_with_set_euxo_pipefail/
set -Ee
print_help() {
echo "Strips YAML FrontMattter from a file (usually Markdown)."
echo
echo "Usage:"
echo " `basename $0` -h"
echo " `basename $0` --help"
echo " `basename $0` -i <file-with-front-matter>"
echo " `basename $0` --in-place <file-with-front-matter>"
echo " `basename $0` <file-with-front-matter> <file-to-be-without-front-matter>"
}
replace=false
in_file="-"
out_file="/dev/stdout"
if [ -n "$1" ]
then
if [ "$1" = "-h" ] || [ "$1" = "--help" ]
then
print_help
exit 0
elif [ "$1" = "-i" ] || [ "$1" = "--in-place" ]
then
replace=true
in_file="$2"
out_file="$in_file"
else
in_file="$1"
if [ -n "$2" ]
then
out_file="$2"
fi
fi
fi
tmp_out_file="$out_file"
if $replace
then
tmp_out_file="${in_file}_tmp"
fi
awk -e '
BEGIN {
is_first_line=1;
in_fm=0;
}
/^---$/ {
if (is_first_line) {
in_fm=1;
}
}
{
if (! in_fm) {
print $0;
}
}
/^(---|...)$/ {
if (! is_first_line) {
in_fm=0;
}
is_first_line=0;
}
' "$in_file" >> "$tmp_out_file"
if $replace
then
mv "$tmp_out_file" "$out_file"
fi
I have a text file with the following content
L,4m,06/03/2013
L,33GJm,06/03/2013,G
L,44Bm,06/03/2013,B
L,4q,08/03/2013
J,4m,04/03/2013
J,3GU,04/03/2013,G
J,3jm,04/03/2013
J,3GJ,04/03/2013,G
J,44Bm,06/03/2013,B
J,34Bq,08/03/2013,B
M,4v,12/03/2013
D,3GU,12/03/2013,G
D,4B,11/03/2013,B
D,4m,12/03/2013
D,3GJ,13/03/2013,G
D,3GU,13/03/2013,G
D,4B,14/03/2013,B
D,4B,14/03/2013,B
D,34Bm,14/03/2013,B
L,33BUq,11/03/2013,B
L,3BJUq,11/03/2013,B
L,44Bq,14/03/2013,B
L,44Bq,14/03/2013,B
L,3Bq,15/03/2013,B
L,3q,15/03/2013
J,34Bjq,11/03/2013,B
J,33GUm,12/03/2013,G
J,4q,13/03/2013
J,33GUq,13/03/2013,G
J,33GUq,13/03/2013,G
J,4q,13/03/2013
M,3BU,18/03/2013,B
M,4B,18/03/2013,B
M,4B,18/03/2013,B
M,3GJ,19/03/2013,G
M,3GJ,19/03/2013,G
D,4B,22/03/2013,B
D,3BU,22/03/2013,B
L,34Bv,18/03/2013,B
L,3jm,19/03/2013
L,4m,19/03/2013
L,33GJm,19/03/2013,G
L,33GUm,19/03/2013,G
J,33BUm,18/03/2013,B
J,4m,18/03/2013
J,4B,18/03/2013,B
J,33BUm,18/03/2013,B
J,4q,22/03/2013
J,4q,22/03/2013
A,3GJ,28/03/2013,G
M,4B,27/03/2013,B
D,4B,25/03/2013,B
L,44Bq,25/03/2013,B
L,34Bq,25/03/2013,B
L,34Bq,25/03/2013,B
L,33BUa,26/03/2013,B
L,33BUq,26/03/2013,B
L,33BUq,26/03/2013,B
L,34Bq,27/03/2013,B
L,34Bq,27/03/2013,B
L,4B,27/03/2013,B
L,34Bq,27/03/2013,B
L,4a,28/03/2013
I want to translate the second column based on the following coding system.
If $2 starts with a 1 or 2 - Change $2 to Excellent
If $2 contains 3BU or 3GU - Change $2 to Good
if $2 contains 3BJ or 3GJ - Change $2 to OK
If $2 starts with a 4 - Change $2 to Poor
If $2 starts with a 5 - Change $2 Terrible
I can find and change the 3BUs to Good easy enough using the following command
awk 'BEGIN{FS=",";OFS=","} {if ($2~ /3(B|G)U/)print $1,"Good",$3}' file | sponge file
Though I use all other non 3(B|G)U lines. I could use if else terminology though this seems inelegant. I have tried to use gensub to solve the problem
awk -F, '{gensub(/3(B|G)U/,Good,"",2)}1' file
But this prints the file contents without substitution. Any hints
Desired output
L,Poor,06/03/2013
L,Ok,06/03/2013,G
L,Poor,06/03/2013,B
L,Poor,08/03/2013
J,Poor,04/03/2013
J,Good,04/03/2013,G
A perl or sed one-liner would also be helpful as this code forms part of a bash shell script
If you want to stick with shell:
(
IFS=,
while read -ra f; do # pick more appropriate variable names
case ${f[1]} in
[12]*) f[1]=Excellent ;;
*3[BG]U*) f[1]=Good ;;
*3[BG]J*) f[1]=OK ;;
4*) f[1]=Poor ;;
5*) f[1]=Terrible ;;
esac
echo "${f[*]}"
done < file
) > tmp && mv tmp file
I ran that in a subshell to localize changes to $IFS
a sed solutions too
sed -e 's/\(^.,\)\(1\|2\)[^,]*/\1Excellent/g' -e 's/\(^.,\)3[BG]U[^,]*/\1Good/g' -e 's/\(^.,\)3[BG]J[^,]*/\1OK/g' -e 's/\(^.,\)4[^,]*/\1Poor/g' -e 's/\(^.,\)5[^,]*/\1Terrible/g' <filename>
$ awk '
BEGIN { FS=OFS="," }
$2 ~ /^(1|2)/ { $2 = "Excellent" }
$2 ~ /3(B|G)U/ { $2 = "Good" }
$2 ~ /3(B|G)J/ { $2 = "OK" }
$2 ~ /^4/ { $2 = "Poor" }
$2 ~ /^5/ { $2 = "Terrible" }
1
' foo.txt | head -n 10
L,Poor,06/03/2013
L,OK,06/03/2013,G
L,Poor,06/03/2013,B
L,Poor,08/03/2013
J,Poor,04/03/2013
J,Good,04/03/2013,G
J,3jm,04/03/2013
J,OK,04/03/2013,G
J,Poor,06/03/2013,B
J,34Bq,08/03/2013,B
perl -pe 's{,(\w+)}{ $_ = /^[12]/ ?"Excellent" :/3[BG]U/ ?"Good" :/3[BG]J/ ?"OK" :/^4/ ?"Poor" :/^5/ ?"Terrible" :$_ for $v=$1; ",$v" }e'
More readable version,
s{,(\w+)}{
for ($v = $1) {
$_ = /^[12]/ ?"Excellent"
:/3[BG]U/ ?"Good"
:/3[BG]J/ ?"OK"
:/^4/ ?"Poor"
:/^5/ ?"Terrible"
:$_;
}
",$v";
}e;
I'm trying to compress a text document by deleting of duplicated empty lines, with sed. This is what I'm doing (to no avail):
sed -i -E 's/\n{3,}/\n/g' file.txt
I understand that it's not correct, according to this manual, but I can't figure out how to do it correctly. Thanks.
I think you want to replace spans of multiple blank lines with a single blank line, even though your example replaces multiple runs of \n with a single \n instead of \n\n. With that in mind, here are two solutions:
sed '/^$/{ :l
N; s/^\n$//; t l
p; d; }' input
In many implementations of sed, that can be all on one line, with the embedded newlines replaced by ;.
awk 't || !/^$/; { t = !/^$/ }'
As tripleee suggested above, I'm using Perl instead of sed:
perl -0777pi -e 's/\n{3,}/\n\n/g'
Use the translate function
tr -s '\n'
the -s or --squeeze-repeats reduces a sequence of repeated character to a single instance.
This is much better handled by tr -s '\n' or cat -s, but if you insist on sed, here's an example from section 4.17 of the GNU sed manual:
#!/usr/bin/sed -f
# on empty lines, join with next
# Note there is a star in the regexp
:x
/^\n*$/ {
N
bx
}
# now, squeeze all '\n', this can be also done by:
# s/^\(\n\)*/\1/
s/\n*/\
/
I am not sure this is what the OP wanted but using the awk solution by William Pursell here is the approach if you want to delete ALL empty lines in the file:
awk '!/^$/' file.txt
Explanation:
The awk pattern
'!/^$/'
is testing whether the current line is consisting only of the beginning of a line (symbolised by '^') and the end of a line (symbolised by '$'), in other words, whether the line is empty.
If this pattern is true awk applies its default and prints the current line.
HTH
I think OP wants to compress empty lines, e.g. where there are 9 consecutive emty lines, he wants to have just three.
I have written a little bash script that does just that:
#! /bin/bash
TOTALLINES="$(cat file.txt|wc -l)"
CURRENTLINE=1
while [ $CURRENTLINE -le $TOTALLINES ]
do
L1=$CURRENTLINE
L2=$(($L1 + 1))
L3=$(($L1 +2))
if [[ $(cat file.txt|head -$L1|tail +$L1) == "" ]]||[[ $(cat file.txt|head -$L1|tail +$L1) == " " ]]
then
L1EMPTY=true
else
L1EMPTY=false
fi
if [[ $(cat file.txt|head -$L2|tail +$L2) == "" ]]||[[ $(cat file.txt|head -$L2|tail +$L2) == " " ]]
then
L2EMPTY=true
else
L2EMPTY=false
fi
if [[ $(cat file.txt|head -$L3|tail +$L3) == "" ]]||[[ $(cat file.txt|head -$L3|tail +$L3) == " " ]]
then
L3EMPTY=true
else
L3EMPTY=false
fi
if [ $L1EMPTY = true ]&&[ $L2EMPTY = true ]&&[ $L3EMPTY = true ]
then
#do not cat line to temp file
echo "Skipping line "$CURRENTLINE
else
echo "$(cat file.txt|head -$CURRENTLINE|tail +$CURRENTLINE)">>temp.txt
echo "Writing line " $CURRENTLINE
fi
((CURRENTLINE++))
done
cat temp.txt>file.txt
rm -r temp.txt
FINALTOTALLINES="$(cat file.txt|wc -l)"
EMPTYLINELINT=$(( $CURRENTLINE - $FINALTOTALLINES ))
echo "Deleted " $EMPTYLINELINT " empty lines."