use processed filename and prepend to first line of the file - sed

I have a bunch of HTML files as shown below:
ati.4.html
fbdevhw.4.html
isdn.ctrl.4.html
modul.efile.4.html
ran.dom.4m.html
tw.policy.4p.html
I need to take the name of the HTML file and prepend it to the first line of each file, such that the lines should be:
<h1>ati(4) - some text tp append</h1>
<h1>fbdevhw(4) - some text tp append</h1>
<h1>isdn.ctrl(4) - some text tp append</h1>
<h1>modul.efile(4) - some text tp append</h1>
<h1>ran.dom(4m) - some text tp append</h1>
<h1>tw.policy(4p) - some text tp append</h1>
Here is what I have done till now. I am close but I think there is a better way to do it in single sed command.
for filename in `ls`
do
rep_text=`echo $filename | sed 's/\.html/\) - some text tp append<\/h1>/' | sed 's/^/<h1>/'
sed -i "1 i\${rep_text}" $filename
done
Output lines which I am getting for prepending:
<h1>ati.4) - some text tp append</h1>
<h1>fbdevhw.4) - some text tp append</h1>
<h1>isdn.ctrl.4) - some text tp append</h1>
<h1>modul.efile.4) - some text tp append</h1>
<h1>ran.dom.4m) - some text tp append</h1>
<h1>tw.policy.4p) - some text tp append</h1>
Not able to convert the 2nd last dot(.) into the round bracket as there can be multiple dots in the file name and I need to only replace the 2nd last dot with ( .

You may use this script:
for f in *.html; do
IFS=. read -ra a <<< "$f" # split into array using . as delimiter
n=$((${#a[#]} - 2)) # set n=length(array) - 2
sed -i "1 i\<h1>${a[0]}(${a[$n]}) - some text to append</h1>" "$f"
done

Do not use sed at all. It will be simpler.
tmpf=$(mktemp)
trap 'rm "$tmpf"' EXIT
for file in *.html; do
filename=${file%.html}
{
printf "<h1>%s(4) - some text tp append</h1>\n" "$filename"
cat "$filename"
} > "$tmpf"
cp "$tmpf" "$filename"
done
Remember to: Check your script with shellcheck. Do not parse ls. Quote variable expansions. Use $(...) instead of backticks `.

Related

how to replace a block of text with new block of text?

As the question title specifies , i have to replace a block to text in a file with a new block of text
I have searched all over for this thing but every solution i ever found was just too specific to the question. Isn't it possible to create a function which is flexible/reusable ?
To be very specific i need something which has options like
1) File ( where changes are to be done )
2) Exiting block of text
3) New block of text
( 2nd & 3 option could be either as manually pasted text or cat $somefile)
whereby i could change these 3 and use the script for all cases of text block replacement , i am sure it will help many other people too
As for an example , currently i need to replace the below block of text with one at bottom and say the file is $HOME/block.txt . Although i need the solution which is easily reusable/flexible as mentioned above
- name: Set default_volumes variable
set_fact:
default_volumes:
- "/opt/lidarr:/config"
- "/opt/scripts:/scripts"
- "/mnt:/mnt"
- "/mnt/unionfs/Media/Music:/music"
- name: Set default_volumes variable
set_fact:
default_volumes:
- "/opt/lidarr:/config"
- "/opt/scripts:/scripts"
- "/mnt:/mnt"
- "/mnt/unionfs/Media/Music:/music"
- "/mnt/unionfs/downloads/lidarr:/downloads-amd"
PS / while replacement i need the spacing and indentation to be preserved.
Your data is serialized using YAML. You should treat it as such.
Using yq
yq eval '
.[0].set_fact.default_volumes +=
[ "/mnt/unionfs/downloads/lidarr:/downloads-amd" ]
'
yq doesn't natively support in-place editing, but you can use sponge to achieve the same thing.
yq eval '
.[0].set_fact.default_volumes +=
[ "/mnt/unionfs/downloads/lidarr:/downloads-amd" ]
' a.yaml | sponge a.yaml
Using Perl
perl -MYAML -0777ne'
my $d = Load($_);
push #{ $d->[0]{set_fact}{default_volumes} },
"/mnt/unionfs/downloads/lidarr:/downloads-amd";
print Dump($d);
'
As per specifying file to process to Perl one-liner, editing in place would look like this:
perl -i -MYAML -0777ne'
my $d = Load($_);
push #{ $d->[0]{set_fact}{default_volumes} },
"/mnt/unionfs/downloads/lidarr:/downloads-amd";
print Dump($d);
' file.yaml
Using GNU awk for multi-char RS and ARGIND, this will work for any chars in your old or new text including regexp metachars, delimiters, quotes, and backreferences as it's just doing literal string search and replace:
awk -v RS='^$' -v ORS= '
ARGIND==1 { old=$0; next }
ARGIND==2 { new=$0; next }
s=index($0,old) {
$0 = substr($0,1,s-1) new substr($0,s+length(old))
}
1' old new file
or you can do the same using any awk in any shell on every Unix box with:
awk -v ORS= '
{ rec = (FNR>1 ? rec RS : "") $0 }
FILENAME==ARGV[1] { old=rec; next }
FILENAME==ARGV[2] { new=rec; next }
END {
$0 = rec
if ( s=index($0,old) ) {
$0 = substr($0,1,s-1) new substr($0,s+length(old))
}
print
}
' old new file
For example:
$ head old new file
==> old <==
- name: Set default_volumes variable
set_fact:
default_volumes:
- "/opt/lidarr:/config"
- "/opt/scripts:/scripts"
- "/mnt:/mnt"
- "/mnt/unionfs/Media/Music:/music"
==> new <==
- name: Set default_volumes variable
set_fact:
default_volumes:
- "/opt/lidarr:/config"
- "/opt/scripts:/scripts"
- "/mnt:/mnt"
- "/mnt/unionfs/Media/Music:/music"
- "/mnt/unionfs/downloads/lidarr:/downloads-amd"
==> file <==
foo
- name: Set default_volumes variable
set_fact:
default_volumes:
- "/opt/lidarr:/config"
- "/opt/scripts:/scripts"
- "/mnt:/mnt"
- "/mnt/unionfs/Media/Music:/music"
bar
$ awk -v RS='^$' -v ORS= 'ARGIND==1{old=$0; next} ARGIND==2{new=$0; next} s=index($0,old){ $0=substr($0,1,s-1) new substr($0,s+length(old))} 1' old new file
foo
- name: Set default_volumes variable
set_fact:
default_volumes:
- "/opt/lidarr:/config"
- "/opt/scripts:/scripts"
- "/mnt:/mnt"
- "/mnt/unionfs/Media/Music:/music"
- "/mnt/unionfs/downloads/lidarr:/downloads-amd"
bar
For a task like this, you could just use existing commands rather than
reinventing the wheel:
sed '/some text to change/,/with indentation/d; /a bit more/r new_file' your_file
I used two example files:
# original file
some original text to keep
a bit more
some text to remove
- with indentation
rest of original text
is kept
and:
# replacement text
SOME TEXT TO ADD
- WITH DIFFERENT INDENTATION
- ANOTHER LEVEL
Then the command works by first deleting lines between and including two
lines matching patterns:
sed '/some text to change/,/with indentation/d;
Then reading the replacement text from some other file, using a pattern
matching just where the old text used to start:
/a bit more/r new_file' your_file
To yield the result:
some original text to keep
a bit more
SOME TEXT TO ADD
- WITH DIFFERENT INDENTATION
- ANOTHER LEVEL
rest of original text
is kept
Edit
The above is better than my original way:
sed '/a bit more/q' your_file > composite; cat new_file >> composite; sed -n '/rest of original text/,/$/p' your_file >> composite

Using SED to Remove Anything but a Pattern

I have a bunch of . pdf file names. For example:
901201_HKW_RNT_HW21_136_137_DE_442_Freigabe_DE_CLX.pdf
and i am trying to remove everything but this pattern XXX_XXX where X is always a digit.
The result should be:
136_137
So far i did the opposite .. manage to match the pattern by using :
set NoSpacesString to do shell script "echo " & quoted form of insideName & " | sed 's/([0-9][0-9][0-9]_[0-9][0-9][0-9])//'"
My goal is to set NoSpaceString to 136_137
Little bit of help please.
Thank you !
P.S. The rest of the code is in AppleScript if this matters
Fixing sed command...
You can use
sed -n 's/.*\([0-9]\{3\}_[0-9]\{3\}\).*/\1/p'
See the online demo
Details
-n - suppresses the default line output
s/.*\([0-9]\{3\}_[0-9]\{3\}\).*/\1/ - finds the .*\([0-9]\{3\}_[0-9]\{3\}\).* pattern that matches
.* - any zero or more chars
\([0-9]\{3\}_[0-9]\{3\}\) - Group 1 (the \1 in the RHS refers to this group value): three digits, _, three digits
.* - any zero or more chars
p - prints the result of the substitution only.
The regex above is a POSIX BRE compliant pattern. The same can be written in POSIX ERE:
sed -En 's/.*([0-9]{3}_[0-9]{3}).*/\1/p'
Final AppleScript code
set noSpacesString to do shell script "sed -En 's/.*([0-9]{3}_[0-9]{3}).*/\\1/p' <<<" & insideName's quoted form
This might work for you (GNU sed):
sed -E '/\n/{P;D};s/[0-9]{3}_[0-9]{3}/\n&\n/;D' file
This solution will print all occurrences of the pattern on a separate line.
The initial command is dependant on what follows.
The second command replaces the desired pattern prepending and appending newlines either side.
The D command removes up to the first newline, but as the pattern space is not empty, restarts the sed cycle (without append the next line).
Now the initial command comes into play. The front of the line is printed and then deleted along with its appended newline.
Again, the sed cycle is restarted as if the line had never been presented but minus any characters up to and including the first desired pattern.
This flip-flop pattern of control is repeated until nothing is left and then repeated on subsequent lines until the end of the file.
Here is a copy of the debug log for a suitable one line input containing two representations of the desired pattern:
SED PROGRAM:
/\n/ {
P
D
}
s/[0-9]{3}_[0-9]{3}/
&
/
D
INPUT: 'file' line 1
PATTERN: aaa123_456bbb123_456ccc
COMMAND: /\n/ {
COMMAND: }
COMMAND: s/[0-9]{3}_[0-9]{3}/
&
/
MATCHED REGEX REGISTERS
regex[0] = 3-10 '123_456'
PATTERN: aaa\n123_456\nbbb123_456ccc
MATCHED REGEX REGISTERS
regex[0] = 0-3 'aaa'
PATTERN: \n123_456\nbbb123_456ccc
COMMAND: D
PATTERN: 123_456\nbbb123_456ccc
COMMAND: /\n/ {
COMMAND: P
123_456
COMMAND: D
PATTERN: bbb123_456ccc
COMMAND: /\n/ {
COMMAND: }
COMMAND: s/[0-9]{3}_[0-9]{3}/
&
/
MATCHED REGEX REGISTERS
regex[0] = 3-10 '123_456'
PATTERN: bbb\n123_456\nccc
MATCHED REGEX REGISTERS
regex[0] = 0-3 'bbb'
PATTERN: \n123_456\nccc
COMMAND: D
PATTERN: 123_456\nccc
COMMAND: /\n/ {
COMMAND: P
123_456
COMMAND: D
PATTERN: ccc
COMMAND: /\n/ {
COMMAND: }
COMMAND: s/[0-9]{3}_[0-9]{3}/
&
/
PATTERN: ccc
MATCHED REGEX REGISTERS
regex[0] = 0-3 'ccc'
PATTERN:
COMMAND: D

How to extract a specific character inside a parentheses using sed command?

I want to extract an atomic symbols inside a parentheses using sed.
The data I have is in the form C(X12), and I only want the X symbol
EX: that a test command :
echo "C(Br12)" | sed 's/[0-9][0-9])$//g'
gives me C(Br.
You can use
sed -n 's/.*(\(.*\)[0-9]\{2\})$/\1/p'
See the online demo:
sed -n 's/.*(\(.*\)[0-9]\{2\})$/\1/p' <<< "c(Br12)"
# => Br
Details
-n - suppresses the default line output
.*(\(.*\)[0-9]\{2\})$ - a regex that matches
.* - any text
( - a ( char
\(.*\) - Capturing group 1: any text up to the last....
[0-9]\{2\} - two digits
)$ - a ) at the end of string
\1 - replaces with Group 1 value
p - prints the result of the substitution.
For example:
echo "C(Br12)" | sed 's/C(\(.\).*/\1/'
C( - match exactly literally C(
. match anything
\(.\) - match anythig - one character- and "remember" it in a backreference \1
.* ignore everything behind it
\1 - replace it by the stuff that was remembered. The first character.
Research sed, regex and backreferences for more information.
Try using the following command
echo "C(BR12)" | cut -d "(" -f2 | cut -d ")" -f1 | sed 's/[0-9]*//g'
The cut tool will split and get you the string in middle of the paranthesis.Then pass the string to a sed for replacing the numbers inside the string.
Not a fully sed solution but this will get you the output.

divide each line in equal part

I would be happy if anyone can suggest me command (sed or AWK one line command) to divide each line of file in equal number of part. For example divide each line in 4 part.
Input:
ATGCATHLMNPHLNTPLML
Output:
ATGCA THLMN PHLNT PLML
This should work using GNU sed:
sed -r 's/(.{4})/\1 /g'
-r is needed to use extended regular expressions
.{4} captures every four characters
\1 refers to the captured group which is surrounded by the parenthesis ( ) and adds a space behind this group
g makes sure that the replacement is done as many times as possible on each line
A test; this is the input and output in my terminal:
$ echo "ATGCATHLMNPHLNTPLML" | sed -r 's/(.{4})/\1 /g'
ATGC ATHL MNPH LNTP LML
I suspect awk is not the best tool for this, but:
gawk --posix '{ l = sprintf( "%d", 1 + (length()-1)/4);
gsub( ".{"l"}", "& " ) } 1' input-file
If you have a posix compliant awk you can omit the --posix, but --posix is necessary for gnu awk and since that seems to be the most commonly used implementation I've given the solution in terms of gawk.
This might work for you (GNU sed):
sed 'h;s/./X/g;s/^\(.*\)\1\1\1/\1 \1 \1 \1/;G;s/\n/&&/;:a;/^\n/bb;/^ /s/ \(.*\n.*\)\n\(.\)/\1 \n\2/;ta;s/^.\(.*\n.*\)\n\(.\)/\1\2\n/;ta;:b;s/\n//g' file
Explanation:
h copy the pattern space (PS) to the hold space (HS)
s/./X/g replace every character in the HS with the same non-space character (in this case X)
s/^\(.*\)\1\1\1/\1 \1 \1 \1/ split the line into 4 parts (space separated)
G append a newline followed by the contents of the HS to the PS
s/\n/&&/ double the newline (to be later used as markers)
:a introduce a loop namespace
/^\n/bb if we reach a newline we are done and branch to the b namespace
/^ /s/ \(.*\n.*\)\n\(.\)/\1 \n\2/;ta; if the first character is a space add a space to the real line at this point and repeat
s/^.\(.*\n.*\)\n\(.\)/\1\2\n/;ta any other character just bump along and repeat
:b;s/\n//g all done just remove the markers and print out the result
This work for any length of line, however is the line is not exactly divisible by 4 the last portion will contain the remainder as well.
perl
perl might be a better choice here:
export cols=4
perl -ne 'chomp; $fw = 1 + int length()/$ENV{cols}; while(/(.{1,$fw})/gm) { print $1 . " " } print "\n"'
This re-calculates field-width for every line.
coreutils
A GNU coreutils alternative, field-width is chosen based on the first line of infile:
cols=4
len=$(( $(head -n1 infile | wc -c) - 1 ))
fw=$(echo "scale=0; 1 + $len / 4" | bc)
cut_arg=$(paste -d- <(seq 1 $fw 19) <(seq $fw $fw $len) | head -c-1 | tr '\n' ',')
Value of cut_arg is in the above case:
1-5,6-10,11-15,16-
Now cut the line into appropriate chunks:
cut --output-delimiter=' ' -c $cut_arg infile

Count the number of occurrences of a string using sed?

I have a file which contains "title" written in it many times. How can I find the number of times "title" is written in that file using the sed command provided that "title" is the first string in a line? e.g.
# title
title
title
should output the count = 2 because in first line title is not the first string.
Update
I used awk to find the total number of occurrences as:
awk '$1 ~ /title/ {++c} END {print c}' FS=: myFile.txt
But how can I tell awk to count only those lines having title the first string as explained in example above?
Never say never. Pure sed (although it may require the GNU version).
#!/bin/sed -nf
# based on a script from the sed info file (info sed)
# section 4.8 Numbering Non-blank Lines (cat -b)
# modified to count lines that begin with "title"
/^title/! be
x
/^$/ s/^.*$/0/
/^9*$/ s/^/0/
s/.9*$/x&/
h
s/^.*x//
y/0123456789/1234567890/
x
s/x.*$//
G
s/\n//
h
:e
$ {x;p}
Explanation:
#!/bin/sed -nf
# run sed without printing output by default (-n)
# using the following file as the sed script (-f)
/^title/! be # if the current line doesn't begin with "title" branch to label e
x # swap the counter from hold space into pattern space
/^$/ s/^.*$/0/ # if pattern space is empty start the counter at zero
/^9*$/ s/^/0/ # if pattern space starts with a nine, prepend a zero
s/.9*$/x&/ # mark the position of the last digit before a sequence of nines (if any)
h # copy the marked counter to hold space
s/^.*x// # delete everything before the marker
y/0123456789/1234567890/ # increment the digits that were after the mark
x # swap pattern space and hold space
s/x.*$// # delete everything after the marker leaving the leading digits
G # append hold space to pattern space
s/\n// # remove the newline, leaving all the digits concatenated
h # save the counter into hold space
:e # label e
$ {x;p} # if this is the last line of input, swap in the counter and print it
Here are excerpts from a trace of the script using sedsed:
$ echo -e 'title\ntitle\nfoo\ntitle\nbar\ntitle\ntitle\ntitle\ntitle\ntitle\ntitle\ntitle\ntitle' | sedsed-1.0 -d -f ./counter
PATT:title$
HOLD:$
COMM:/^title/ !b e
COMM:x
PATT:$
HOLD:title$
COMM:/^$/ s/^.*$/0/
PATT:0$
HOLD:title$
COMM:/^9*$/ s/^/0/
PATT:0$
HOLD:title$
COMM:s/.9*$/x&/
PATT:x0$
HOLD:title$
COMM:h
PATT:x0$
HOLD:x0$
COMM:s/^.*x//
PATT:0$
HOLD:x0$
COMM:y/0123456789/1234567890/
PATT:1$
HOLD:x0$
COMM:x
PATT:x0$
HOLD:1$
COMM:s/x.*$//
PATT:$
HOLD:1$
COMM:G
PATT:\n1$
HOLD:1$
COMM:s/\n//
PATT:1$
HOLD:1$
COMM:h
PATT:1$
HOLD:1$
COMM::e
COMM:$ {
PATT:1$
HOLD:1$
PATT:title$
HOLD:1$
COMM:/^title/ !b e
COMM:x
PATT:1$
HOLD:title$
COMM:/^$/ s/^.*$/0/
PATT:1$
HOLD:title$
COMM:/^9*$/ s/^/0/
PATT:1$
HOLD:title$
COMM:s/.9*$/x&/
PATT:x1$
HOLD:title$
COMM:h
PATT:x1$
HOLD:x1$
COMM:s/^.*x//
PATT:1$
HOLD:x1$
COMM:y/0123456789/1234567890/
PATT:2$
HOLD:x1$
COMM:x
PATT:x1$
HOLD:2$
COMM:s/x.*$//
PATT:$
HOLD:2$
COMM:G
PATT:\n2$
HOLD:2$
COMM:s/\n//
PATT:2$
HOLD:2$
COMM:h
PATT:2$
HOLD:2$
COMM::e
COMM:$ {
PATT:2$
HOLD:2$
PATT:foo$
HOLD:2$
COMM:/^title/ !b e
COMM:$ {
PATT:foo$
HOLD:2$
. . .
PATT:10$
HOLD:10$
PATT:title$
HOLD:10$
COMM:/^title/ !b e
COMM:x
PATT:10$
HOLD:title$
COMM:/^$/ s/^.*$/0/
PATT:10$
HOLD:title$
COMM:/^9*$/ s/^/0/
PATT:10$
HOLD:title$
COMM:s/.9*$/x&/
PATT:1x0$
HOLD:title$
COMM:h
PATT:1x0$
HOLD:1x0$
COMM:s/^.*x//
PATT:0$
HOLD:1x0$
COMM:y/0123456789/1234567890/
PATT:1$
HOLD:1x0$
COMM:x
PATT:1x0$
HOLD:1$
COMM:s/x.*$//
PATT:1$
HOLD:1$
COMM:G
PATT:1\n1$
HOLD:1$
COMM:s/\n//
PATT:11$
HOLD:1$
COMM:h
PATT:11$
HOLD:11$
COMM::e
COMM:$ {
COMM:x
PATT:11$
HOLD:11$
COMM:p
11
PATT:11$
HOLD:11$
COMM:}
PATT:11$
HOLD:11$
The ellipsis represents lines of output I omitted here. The line with "11" on it by itself is where the final count is output. That's the only output you'd get when the sedsed debugger isn't being used.
Revised answer
Succinctly, you can't - sed is not the correct tool for the job (it cannot count).
sed -n '/^title/p' file | grep -c
This looks for lines starting title and prints them, feeding the output into grep to count them. Or, equivalently:
grep -c '^title' file
Original answer - before the question was edited
Succinctly, you can't - it is not the correct tool for the job.
grep -c title file
sed -n /title/p file | wc -l
The second uses sed as a surrogate for grep and sends the output to 'wc' to count lines. Both count the number of lines containing 'title', rather than the number of occurrences of title.
You could fix that with something like:
cat file |
tr ' ' '\n' |
grep -c title
The 'tr' command converts blanks into newlines, thus putting each space separated word on its own line, and therefore grep only gets to count lines containing the word title. That works unless you have sequences such as 'title-entitlement' where there's no space separating the two occurrences of title.
I don't think sed would be appropriate, unless you use it in a pipeline to convert your file so that the word you need appears on separate lines, and then use grep -c to count the occurrences.
I like Jonathan's idea of using tr to convert spaces to newlines. The beauty of this method is that successive spaces get converted to multiple blank lines but it doesn't matter because grep will be able to count just the lines with the single word 'title'.
just one gawk command will do. Don't use grep -c because it only counts line with "title" in it, regardless of how many "title"s there are in the line.
$ more file
# title
# title
one
two
#title
title title
three
title junk title
title
four
fivetitlesixtitle
last
$ awk '!/^#.*title/{m=gsub("title","");total+=m}END{print "total: "total}' file
total: 7
if you just want "title" as the first string, use "==" instead of ~
awk '$1 == "title"{++c}END{print c}' file
sed 's/title/title\n/g' file | grep -c title
This might work for you:
sed '/^title/!d' file | sed -n '$='