how to replace a block of text with new block of text? - perl

As the question title specifies , i have to replace a block to text in a file with a new block of text
I have searched all over for this thing but every solution i ever found was just too specific to the question. Isn't it possible to create a function which is flexible/reusable ?
To be very specific i need something which has options like
1) File ( where changes are to be done )
2) Exiting block of text
3) New block of text
( 2nd & 3 option could be either as manually pasted text or cat $somefile)
whereby i could change these 3 and use the script for all cases of text block replacement , i am sure it will help many other people too
As for an example , currently i need to replace the below block of text with one at bottom and say the file is $HOME/block.txt . Although i need the solution which is easily reusable/flexible as mentioned above
- name: Set default_volumes variable
set_fact:
default_volumes:
- "/opt/lidarr:/config"
- "/opt/scripts:/scripts"
- "/mnt:/mnt"
- "/mnt/unionfs/Media/Music:/music"
- name: Set default_volumes variable
set_fact:
default_volumes:
- "/opt/lidarr:/config"
- "/opt/scripts:/scripts"
- "/mnt:/mnt"
- "/mnt/unionfs/Media/Music:/music"
- "/mnt/unionfs/downloads/lidarr:/downloads-amd"
PS / while replacement i need the spacing and indentation to be preserved.

Your data is serialized using YAML. You should treat it as such.
Using yq
yq eval '
.[0].set_fact.default_volumes +=
[ "/mnt/unionfs/downloads/lidarr:/downloads-amd" ]
'
yq doesn't natively support in-place editing, but you can use sponge to achieve the same thing.
yq eval '
.[0].set_fact.default_volumes +=
[ "/mnt/unionfs/downloads/lidarr:/downloads-amd" ]
' a.yaml | sponge a.yaml
Using Perl
perl -MYAML -0777ne'
my $d = Load($_);
push #{ $d->[0]{set_fact}{default_volumes} },
"/mnt/unionfs/downloads/lidarr:/downloads-amd";
print Dump($d);
'
As per specifying file to process to Perl one-liner, editing in place would look like this:
perl -i -MYAML -0777ne'
my $d = Load($_);
push #{ $d->[0]{set_fact}{default_volumes} },
"/mnt/unionfs/downloads/lidarr:/downloads-amd";
print Dump($d);
' file.yaml

Using GNU awk for multi-char RS and ARGIND, this will work for any chars in your old or new text including regexp metachars, delimiters, quotes, and backreferences as it's just doing literal string search and replace:
awk -v RS='^$' -v ORS= '
ARGIND==1 { old=$0; next }
ARGIND==2 { new=$0; next }
s=index($0,old) {
$0 = substr($0,1,s-1) new substr($0,s+length(old))
}
1' old new file
or you can do the same using any awk in any shell on every Unix box with:
awk -v ORS= '
{ rec = (FNR>1 ? rec RS : "") $0 }
FILENAME==ARGV[1] { old=rec; next }
FILENAME==ARGV[2] { new=rec; next }
END {
$0 = rec
if ( s=index($0,old) ) {
$0 = substr($0,1,s-1) new substr($0,s+length(old))
}
print
}
' old new file
For example:
$ head old new file
==> old <==
- name: Set default_volumes variable
set_fact:
default_volumes:
- "/opt/lidarr:/config"
- "/opt/scripts:/scripts"
- "/mnt:/mnt"
- "/mnt/unionfs/Media/Music:/music"
==> new <==
- name: Set default_volumes variable
set_fact:
default_volumes:
- "/opt/lidarr:/config"
- "/opt/scripts:/scripts"
- "/mnt:/mnt"
- "/mnt/unionfs/Media/Music:/music"
- "/mnt/unionfs/downloads/lidarr:/downloads-amd"
==> file <==
foo
- name: Set default_volumes variable
set_fact:
default_volumes:
- "/opt/lidarr:/config"
- "/opt/scripts:/scripts"
- "/mnt:/mnt"
- "/mnt/unionfs/Media/Music:/music"
bar
$ awk -v RS='^$' -v ORS= 'ARGIND==1{old=$0; next} ARGIND==2{new=$0; next} s=index($0,old){ $0=substr($0,1,s-1) new substr($0,s+length(old))} 1' old new file
foo
- name: Set default_volumes variable
set_fact:
default_volumes:
- "/opt/lidarr:/config"
- "/opt/scripts:/scripts"
- "/mnt:/mnt"
- "/mnt/unionfs/Media/Music:/music"
- "/mnt/unionfs/downloads/lidarr:/downloads-amd"
bar

For a task like this, you could just use existing commands rather than
reinventing the wheel:
sed '/some text to change/,/with indentation/d; /a bit more/r new_file' your_file
I used two example files:
# original file
some original text to keep
a bit more
some text to remove
- with indentation
rest of original text
is kept
and:
# replacement text
SOME TEXT TO ADD
- WITH DIFFERENT INDENTATION
- ANOTHER LEVEL
Then the command works by first deleting lines between and including two
lines matching patterns:
sed '/some text to change/,/with indentation/d;
Then reading the replacement text from some other file, using a pattern
matching just where the old text used to start:
/a bit more/r new_file' your_file
To yield the result:
some original text to keep
a bit more
SOME TEXT TO ADD
- WITH DIFFERENT INDENTATION
- ANOTHER LEVEL
rest of original text
is kept
Edit
The above is better than my original way:
sed '/a bit more/q' your_file > composite; cat new_file >> composite; sed -n '/rest of original text/,/$/p' your_file >> composite

Related

use processed filename and prepend to first line of the file

I have a bunch of HTML files as shown below:
ati.4.html
fbdevhw.4.html
isdn.ctrl.4.html
modul.efile.4.html
ran.dom.4m.html
tw.policy.4p.html
I need to take the name of the HTML file and prepend it to the first line of each file, such that the lines should be:
<h1>ati(4) - some text tp append</h1>
<h1>fbdevhw(4) - some text tp append</h1>
<h1>isdn.ctrl(4) - some text tp append</h1>
<h1>modul.efile(4) - some text tp append</h1>
<h1>ran.dom(4m) - some text tp append</h1>
<h1>tw.policy(4p) - some text tp append</h1>
Here is what I have done till now. I am close but I think there is a better way to do it in single sed command.
for filename in `ls`
do
rep_text=`echo $filename | sed 's/\.html/\) - some text tp append<\/h1>/' | sed 's/^/<h1>/'
sed -i "1 i\${rep_text}" $filename
done
Output lines which I am getting for prepending:
<h1>ati.4) - some text tp append</h1>
<h1>fbdevhw.4) - some text tp append</h1>
<h1>isdn.ctrl.4) - some text tp append</h1>
<h1>modul.efile.4) - some text tp append</h1>
<h1>ran.dom.4m) - some text tp append</h1>
<h1>tw.policy.4p) - some text tp append</h1>
Not able to convert the 2nd last dot(.) into the round bracket as there can be multiple dots in the file name and I need to only replace the 2nd last dot with ( .
You may use this script:
for f in *.html; do
IFS=. read -ra a <<< "$f" # split into array using . as delimiter
n=$((${#a[#]} - 2)) # set n=length(array) - 2
sed -i "1 i\<h1>${a[0]}(${a[$n]}) - some text to append</h1>" "$f"
done
Do not use sed at all. It will be simpler.
tmpf=$(mktemp)
trap 'rm "$tmpf"' EXIT
for file in *.html; do
filename=${file%.html}
{
printf "<h1>%s(4) - some text tp append</h1>\n" "$filename"
cat "$filename"
} > "$tmpf"
cp "$tmpf" "$filename"
done
Remember to: Check your script with shellcheck. Do not parse ls. Quote variable expansions. Use $(...) instead of backticks `.

Using SED to Remove Anything but a Pattern

I have a bunch of . pdf file names. For example:
901201_HKW_RNT_HW21_136_137_DE_442_Freigabe_DE_CLX.pdf
and i am trying to remove everything but this pattern XXX_XXX where X is always a digit.
The result should be:
136_137
So far i did the opposite .. manage to match the pattern by using :
set NoSpacesString to do shell script "echo " & quoted form of insideName & " | sed 's/([0-9][0-9][0-9]_[0-9][0-9][0-9])//'"
My goal is to set NoSpaceString to 136_137
Little bit of help please.
Thank you !
P.S. The rest of the code is in AppleScript if this matters
Fixing sed command...
You can use
sed -n 's/.*\([0-9]\{3\}_[0-9]\{3\}\).*/\1/p'
See the online demo
Details
-n - suppresses the default line output
s/.*\([0-9]\{3\}_[0-9]\{3\}\).*/\1/ - finds the .*\([0-9]\{3\}_[0-9]\{3\}\).* pattern that matches
.* - any zero or more chars
\([0-9]\{3\}_[0-9]\{3\}\) - Group 1 (the \1 in the RHS refers to this group value): three digits, _, three digits
.* - any zero or more chars
p - prints the result of the substitution only.
The regex above is a POSIX BRE compliant pattern. The same can be written in POSIX ERE:
sed -En 's/.*([0-9]{3}_[0-9]{3}).*/\1/p'
Final AppleScript code
set noSpacesString to do shell script "sed -En 's/.*([0-9]{3}_[0-9]{3}).*/\\1/p' <<<" & insideName's quoted form
This might work for you (GNU sed):
sed -E '/\n/{P;D};s/[0-9]{3}_[0-9]{3}/\n&\n/;D' file
This solution will print all occurrences of the pattern on a separate line.
The initial command is dependant on what follows.
The second command replaces the desired pattern prepending and appending newlines either side.
The D command removes up to the first newline, but as the pattern space is not empty, restarts the sed cycle (without append the next line).
Now the initial command comes into play. The front of the line is printed and then deleted along with its appended newline.
Again, the sed cycle is restarted as if the line had never been presented but minus any characters up to and including the first desired pattern.
This flip-flop pattern of control is repeated until nothing is left and then repeated on subsequent lines until the end of the file.
Here is a copy of the debug log for a suitable one line input containing two representations of the desired pattern:
SED PROGRAM:
/\n/ {
P
D
}
s/[0-9]{3}_[0-9]{3}/
&
/
D
INPUT: 'file' line 1
PATTERN: aaa123_456bbb123_456ccc
COMMAND: /\n/ {
COMMAND: }
COMMAND: s/[0-9]{3}_[0-9]{3}/
&
/
MATCHED REGEX REGISTERS
regex[0] = 3-10 '123_456'
PATTERN: aaa\n123_456\nbbb123_456ccc
MATCHED REGEX REGISTERS
regex[0] = 0-3 'aaa'
PATTERN: \n123_456\nbbb123_456ccc
COMMAND: D
PATTERN: 123_456\nbbb123_456ccc
COMMAND: /\n/ {
COMMAND: P
123_456
COMMAND: D
PATTERN: bbb123_456ccc
COMMAND: /\n/ {
COMMAND: }
COMMAND: s/[0-9]{3}_[0-9]{3}/
&
/
MATCHED REGEX REGISTERS
regex[0] = 3-10 '123_456'
PATTERN: bbb\n123_456\nccc
MATCHED REGEX REGISTERS
regex[0] = 0-3 'bbb'
PATTERN: \n123_456\nccc
COMMAND: D
PATTERN: 123_456\nccc
COMMAND: /\n/ {
COMMAND: P
123_456
COMMAND: D
PATTERN: ccc
COMMAND: /\n/ {
COMMAND: }
COMMAND: s/[0-9]{3}_[0-9]{3}/
&
/
PATTERN: ccc
MATCHED REGEX REGISTERS
regex[0] = 0-3 'ccc'
PATTERN:
COMMAND: D

How to extract a specific character inside a parentheses using sed command?

I want to extract an atomic symbols inside a parentheses using sed.
The data I have is in the form C(X12), and I only want the X symbol
EX: that a test command :
echo "C(Br12)" | sed 's/[0-9][0-9])$//g'
gives me C(Br.
You can use
sed -n 's/.*(\(.*\)[0-9]\{2\})$/\1/p'
See the online demo:
sed -n 's/.*(\(.*\)[0-9]\{2\})$/\1/p' <<< "c(Br12)"
# => Br
Details
-n - suppresses the default line output
.*(\(.*\)[0-9]\{2\})$ - a regex that matches
.* - any text
( - a ( char
\(.*\) - Capturing group 1: any text up to the last....
[0-9]\{2\} - two digits
)$ - a ) at the end of string
\1 - replaces with Group 1 value
p - prints the result of the substitution.
For example:
echo "C(Br12)" | sed 's/C(\(.\).*/\1/'
C( - match exactly literally C(
. match anything
\(.\) - match anythig - one character- and "remember" it in a backreference \1
.* ignore everything behind it
\1 - replace it by the stuff that was remembered. The first character.
Research sed, regex and backreferences for more information.
Try using the following command
echo "C(BR12)" | cut -d "(" -f2 | cut -d ")" -f1 | sed 's/[0-9]*//g'
The cut tool will split and get you the string in middle of the paranthesis.Then pass the string to a sed for replacing the numbers inside the string.
Not a fully sed solution but this will get you the output.

Replace one matched pattern with another in multiline text with sed

I have file with this text:
mirrors:
docker.io:
endpoint:
- "http://registry:5000"
registry:5000:
endpoint:
- "http://registry:5000"
localhost:
endpoint:
- "http://registry:5000"
I need to replace it with this text in POSIX shell script (not bash):
mirrors:
docker.io:
endpoint:
- "http://docker.io"
registry:5000:
endpoint:
- "http://registry:5000"
localhost:
endpoint:
- "http://localhost"
Replace should be done dynamically in all places without hard-coded names. I mean we should take sub-string from a first line ("docker.io", "registry:5000", "localhost") and replace with it sub-string "registry:5000" in a third line.
I've figure out regex, that splits it on 5 groups: (^ )([^ ]*)(:[^"]*"http:\/\/)([^"]*)(")
Then I've tried to use sed to print group 2 instead of 4, but this didn't work: sed -n 's/\(^ \)\([^ ]*\)\(:[^"]*"http:\/\/\)\([^"]*\)\("\)/\1\2\3\2\5/p'
Please help!
This might work for you (GNU sed):
sed -E '1N;N;/\n.*endpoint:.*\n/s#((\S+):.*"http://)[^"]*#\1\2#;P;D' file
Open up a three line window into the file.
If the second line contains endpoint:, replace the last piece of text following http:// with the first piece of text before :
Print/Delete the first line of the window and then replenish the three line window by appending the next line.
Repeat until the end of the file.
Awk would be a better candidate for this, passing in the string to change to as a variable str and the section to change (" docker.io" or " localhost" or " registry:5000") and so:
awk -v findstr=" docker.io" -v str="http://docker.io" '
$0 ~ findstr { dockfound=1 # We have found the section passed in findstr and so we set the dockfound marker
}
/endpoint/ && dockfound==1 { # We encounter endpoint after the dockfound marker is set and so we set the found marker
found=1;
print;
next
}
found==1 && dockfound==1 { # We know from the found and the dockfound markers being set that we need to process this line
match($0,/^[[:space:]]+-[[:space:]]"/); # Match the start of the line to the beginning quote
$0=substr($0,RSTART,RLENGTH)str"\""; # Print the matched section followed by the replacement string (str) and the closing quote
found=0; # Reset the markers
dockfound=0
}1' file
One liner:
awk -v findstr=" docker.io" -v str="http://docker.io" '$0 ~ findstr { dockfound=1 } /endpoint/ && dockfound==1 { found=1;print;next } found==1 && dockfound==1 { match($0,/^[[:space:]]+-[[:space:]]"/);$0=substr($0,RSTART,RLENGTH)str"\"";found=0;dockfound=0 }1' file

How to remove YAML frontmatter from markdown files?

I have markdown files that contain YAML frontmatter metadata, like this:
---
title: Something Somethingelse
author: Somebody Sometheson
---
But the YAML is of varying widths. Can I use a Posix command like sed to remove that frontmatter when it's at the beginning of a file? Something that just removes everything between --- and ---, inclusive, but also ignores the rest of the file, in case there are ---s elsewhere.
I understand your question to mean that you want to remove the first ----enclosed block if it starts at the first line. In that case,
sed '1 { /^---/ { :a N; /\n---/! ba; d} }' filename
This is:
1 { # in the first line
/^---/ { # if it starts with ---
:a # jump label for looping
N # fetch the next line, append to pattern space
/\n---/! ba; # if the result does not contain \n--- (that is, if the last
# fetched line does not begin with ---), go back to :a
d # then delete the whole thing.
}
}
# otherwise drop off the end here and do the default (print
# the line)
Depending on how you want to handle lines that begin with ---abc or so, you may have to change the patterns a little (perhaps add $ at the end to only match when the whole line is ---). I'm a bit unclear on your precise requirements there.
If you want to remove only the front matter, you could simply run:
sed '1{/^---$/!q;};1,/^---$/d' infile
If the first line doesn't match ---, sed will quit; else it will delete everything from the 1st line up to (and including) the next line matching --- (i.e. the entire front matter).
If you don't mind the "or something" being perl.
Simply print after two instances of "---" have been found:
perl -ne 'if ($i > 1) { print } else { /^---/ && $i++ }' yaml
or a bit shorter if you don't mind abusing ?: for flow control:
perl -ne '$i > 1 ? print : /^---/ && $i++' yaml
Be sure to include -i if you want to replace inline.
you use a bash file, create script.sh and make it executable using chmod +x script.sh and run it ./script.sh.
#!/bin/bash
#folder articles contains a lot of markdown files
files=./articles/*.md
for f in $files;
do
#filename
echo "${f##*/}"
#replace frontmatter title attribute to "title"
sed -i -r 's/^title: (.*)$/title: "\1"/' $f
#...
done
This AWK based solution works for files with and without FrontMatter, doing nothing in the later case.
#!/bin/sh
# Strips YAML FrontMattter from a file (usually Markdown).
# Exit immediately on each error and unset variable;
# see: https://vaneyckt.io/posts/safer_bash_scripts_with_set_euxo_pipefail/
set -Ee
print_help() {
echo "Strips YAML FrontMattter from a file (usually Markdown)."
echo
echo "Usage:"
echo " `basename $0` -h"
echo " `basename $0` --help"
echo " `basename $0` -i <file-with-front-matter>"
echo " `basename $0` --in-place <file-with-front-matter>"
echo " `basename $0` <file-with-front-matter> <file-to-be-without-front-matter>"
}
replace=false
in_file="-"
out_file="/dev/stdout"
if [ -n "$1" ]
then
if [ "$1" = "-h" ] || [ "$1" = "--help" ]
then
print_help
exit 0
elif [ "$1" = "-i" ] || [ "$1" = "--in-place" ]
then
replace=true
in_file="$2"
out_file="$in_file"
else
in_file="$1"
if [ -n "$2" ]
then
out_file="$2"
fi
fi
fi
tmp_out_file="$out_file"
if $replace
then
tmp_out_file="${in_file}_tmp"
fi
awk -e '
BEGIN {
is_first_line=1;
in_fm=0;
}
/^---$/ {
if (is_first_line) {
in_fm=1;
}
}
{
if (! in_fm) {
print $0;
}
}
/^(---|...)$/ {
if (! is_first_line) {
in_fm=0;
}
is_first_line=0;
}
' "$in_file" >> "$tmp_out_file"
if $replace
then
mv "$tmp_out_file" "$out_file"
fi