Perl regex match string including multiple newlines - perl

How do I match a multiple line string in a file like
# custom prompt
aa=`command1 arg1 arg2`
bb=`command2 arg3 arg4`
PS1="$aa$bb"
# custom prompt
I am using this
perl -0pe 's/# custom prompt\n.*\n.*\n.*\n# custom prompt\n//gm' -i .bashrc
I want to delete all the lines between # custom prompt ~ # custom prompt (including the # custom prompt lines). But the one liner works only for 5 lines cases. Is there a way to match arbitrary multiple lines with new lines like (this does not work)
perl -0pe 's/# custom prompt[\n.]+# custom prompt\n//gm' -i .bashrc

Removing lines:
perl -i -0pe 's/# custom prompt.*?# custom prompt\s*\n//s' .bashrc
or with anchors
perl -i -0pe 's/^# custom prompt.*^# custom prompt\n//sm' .bashrc
# ^
# needed m modifier
.*? is the no-greedy way to write .*.

While this isn't "in-place" per se, it's more straight forward than having to increment tracking counters, while also allowing for existence of blank lines in between the 2 # command prompt's :
assumptions :
— there are exactly 2 # command prompt's in the input - no more, no less
— nothing before the 1st # command prompt
printf '%s\n' '# custom prompt
aa=`command1 arg1 arg2`
bb=`command2 arg3 arg4`
PS1="$aa$bb"
cc=`command3 arg5 arg6`
dd=`command4 arg7 arg8`
# custom prompt
PS2="$bb$aa"' |
gtee >( gcat -n | mawk 'BEGIN { print } END { print RS } !_' >&2; ) |
{m,g,n}awk '($!NF = $NF)^_' FS='[#] custom prompt\n' RS='^$' ORS= |
gcat -b | mawk 'BEGIN { print (_="---- AFTER ----") } END { print (_)"\n" } _'
1 # custom prompt
2 aa=`command1 arg1 arg2`
3 bb=`command2 arg3 arg4`
4 PS1="$aa$bb"
5 cc=`command3 arg5 arg6`
6
7 dd=`command4 arg7 arg8`
8 # custom prompt
9 PS2="$bb$aa"
---- AFTER ----
1 PS2="$bb$aa"
---- AFTER ----

Related

Subset a string in POSIX shell

I have a variable set in the following format:
var1="word1 word2 word3"
Is it possible to subset/delete one of the space-delimited word portably? What I want to archive is something like this:
when --ignore option is supplied with the following argument
$ cmd --ignore word1 # case 1
$ cmd --ignore "word1 word2" # case2
I want the var1 changes to have only the following value
"word2 word3" # case1
"word3" #case2
If there is no way to achieve above described, is there a way to improve the efficiency of the following for loop? (The $var1 is in a for loop so my alternative thought to achieve similar was having following code)
# while loop to get argument from options
# argument of `--ignore` is assigned to `$sh_ignore`
for i in $var1
do
# check $i in $sh_ignore instead of other way around
# to avoid unmatch when $sh_ignore has more than 1 word
if ! echo "$sh_ignore" | grep "$i";
then
# normal actions
else
# skipped
fi
done
-------Update-------
After looking around and reading the comment by #chepner I now temporarily using following code (and am looking for improvement):
sh_ignore=''
while :; do
case
# some other option handling
--ignore)
if [ "$2" ]; then
sh_ignore=$2
shift
else
# defined `die` as print err msg + exit 1
die 'ERROR: "--ignore" requires a non-empty option argument.'
fi
;;
# handling if no arg is supplied to --ignore
# handling -- and unknown opt
esac
shift
done
if [ -n "$sh_ignore" ]; then
for d in $sh_ignore
do
var1="$(echo "$var1" | sed -e "s,$d,,")"
done
fi
# for loop with trimmed $var1 as downstream
for i in $var1
do
# normal actions
done
One method might be:
var1=$(echo "$var1" |
tr ' ' '\n' |
grep -Fxv -e "$(echo "$sh_ignore" | tr ' ' '\n')" |
tr '\n' ' ')
Note: this will leave a trailing blank, which can be trimmed off via var1=${var1% }

How to deal with sed in Tcl script with substitution

I am writing a Tcl script which inserts some text in a file behind the matched line. The following are the basic codes in the script.
set test_lists [list "test_1"\
"test_2"\
"test_3"\
"test_4"\
"test_5"
]
foreach test $test_lists {
set content "
'some_data/$test'
"
exec sed -i "/dog/a$content" /Users/l/Documents/Codes/TCL/file.txt
}
However, when I run this script, it always shows me this error:
dyn-078192:TCL l$ tclsh test.tcl
sed: -e expression #1, char 12: unknown command: `''
while executing
"exec sed -i "/dog/a$content" /Users/l/Documents/Codes/TCL/file.txt"
("foreach" body line 5)
invoked from within
"foreach test $test_lists {
set content "
'some_data/$test'
"
exec sed -i "/dog/a$content" /Users/l/Documents/Codes/TCL/file.txt
}"
(file "test.tcl" line 8)
Somehow it always tried to evaluate the first word in $contentas a command.
Any idea what should I do here to make this work?
Thanks.
You first should decide exactly what characters need to be processed by sed. (See https://unix.stackexchange.com/questions/445531/how-to-chain-sed-append-commands for why this can matter…) They might possibly be:
/dog/a\
'some_data/test_1'
which would turn a file like:
abc
dog
hij
into
abc
dog
'some_data/test_1'
hij
If that's what you want, you can then proceed to the second stage: getting those characters from Tcl into sed.
# NB: *no* newline here!
set content "'some_data/$test'"
# NB: there's a quoted backslashes and two quoted newlines here
exec sed -i "/dog/a\\\n$content\n" /Users/l/Documents/Codes/TCL/file.txt
One of the few places where you need to be careful with quoting in Tcl is when you have backslashes and newlines in close proximity.
Why not perform the text transformation directly in Tcl itself? This might reverse the order of inserted lines compared to the original code. You can fix that by lreverseing the list at a convenient time, and perhaps you will also want to do further massaging of the text to insert. That's all refinements...
set test_lists [list "'some_data/test_1'"\
"'some_data/test_2'"\
"'some_data/test_3'"\
"'some_data/test_4'"\
"'some_data/test_5'"
]
set filename /Users/l/Documents/Codes/TCL/file.txt
set REGEXP "dog"
# Read in the data; this is good even for pretty large files
set f [open $filename]
set lines [split [read $f] "\n"]
close $f
# Search for first matching line by regular expression
set idx [lsearch -regexp $lines $REGEXP]
if {$idx >= 0} {
# Found something, so do the insert in the list of lines
set lines [linsert $lines [expr {$idx + 1}] {*}$test_lists]
# Write back to the file as we've made changes
set f [open $filename "w"]
puts -nonewline $f [join $lines "\n"]
close $f
}
(an extended comment, not an answer)
Running this in the shell to clarify your desired output: is this what you want?
$ cat file.txt
foo
dog A
dog B
dog C
dog D
dog E
bar
$ for test in test_{1..5}; do content="some_data/$test"; sed -i "/dog/a$content" file.txt; done
$ cat file.txt
foo
dog A
some_data/test_5
some_data/test_4
some_data/test_3
some_data/test_2
some_data/test_1
dog B
some_data/test_5
some_data/test_4
some_data/test_3
some_data/test_2
some_data/test_1
dog C
some_data/test_5
some_data/test_4
some_data/test_3
some_data/test_2
some_data/test_1
dog D
some_data/test_5
some_data/test_4
some_data/test_3
some_data/test_2
some_data/test_1
dog E
some_data/test_5
some_data/test_4
some_data/test_3
some_data/test_2
some_data/test_1
bar

find the line number where a specific word appears with “sed” on tcl shell

I need to search for a specific word in a file starting from specific line and return the line numbers only for the matched lines.
Let's say I want to search a file called myfile for the word my_word and then store the returned line numbers.
By using shell script the command :
sed -n '10,$ { /$my_word /= }' $myfile
works fine but how to write that command on tcl shell?
% exec sed -n '10,$ { /$my_word/= }' $file
extra characters after close-brace.
I want to add that the following command works fine on tcl shell but it starts from the beginning of the file
% exec sed -n "/$my_word/=" $file
447431
447445
448434
448696
448711
448759
450979
451006
451119
451209
451245
452936
454408
I have solved the problem as follows
set lineno 10
if { ! [catch {exec sed -n "/$new_token/=" $file} lineFound] && [string length $lineFound] > 0 } {
set lineNumbers [split $lineFound "\n"]
foreach num $lineNumbers {
if {[expr {$num >= $lineno}] } {
lappend col $num
}
}
}
Still can't find a single line that solve the problem
Any suggestions ??
I don't understand a thing: is the text you are looking for stored inside the variable called my_word or is the literal value my_word?
In your line
% exec sed -n '10,$ { /$my_word/= }' $file
I'd say it's the first case. So you have before it something like
% set my_word wordtosearch
% set file filetosearchin
Your mistake is to use the single quote character ' to enclose the sed expression. That character is an enclosing operator in sh, but has no meaning in Tcl.
You use it in sh to group many words in a single argument that is passed to sed, so you have to do the same, but using Tcl syntax:
% set my_word wordtosearch
% set file filetosearchin
% exec sed -n "10,$ { /$my_word/= }" $file
Here, you use the "..." to group.
You don't escape the $ in $my_word because you want $my_word to be substitued with the string wordtosearch.
I hope this helps.
After a few trial-and-error I came up with:
set output [exec sed -n "10,\$ \{ /$myword/= \}" $myfile]
# Do something with the output
puts $output
The key is to escape characters that are special to TCL, such as the dollar sign, curly braces.
Update
Per Donal Fellows, we do not need to escape the dollar sign:
set output [exec sed -n "10,$ \{ /$myword/= \}" $myfile]
I have tried the new revision and found it works. Thank you, Donal.
Update 2
I finally gained access to a Windows 7 machine, installed Cygwin (which includes sed and tclsh). I tried out the above script and it works just fine. I don't know what your problem is. Interestingly, the same script failed on my Mac OS X system with the following error:
sed: 1: "10,$ { /ipsum/= }": extra characters at the end of = command
while executing
"exec sed -n "10,$ \{ /$myword/= \}" $myfile"
invoked from within
"set output [exec sed -n "10,$ \{ /$myword/= \}" $myfile]"
(file "sed.tcl" line 6)
I guess there is a difference between Linux and BSD systems.
Update 3
I have tried the same script under Linux/Tcl 8.4 and it works. That might mean Tcl 8.4 has nothing to do with it. Here is something else that might help: Tcl comes with a package called fileutil, which is part of the tcllib. The fileutil package contains a useful tool for this case: fileutil::grep. Here is a sample on how to use it in your case:
package require fileutil
proc grep_demo {myword myfile} {
foreach line [fileutil::grep $myword $myfile] {
# Each line is in the format:
# filename:linenumber:text
set lineNumber [lindex [split $line :] 1]
if {$lineNumber >= 10} { puts $lineNumber}
}
}
puts [grep_demo $myword $myfile]
Here is how to do it with awk
awk 'NR>10 && $0~f {print NR}' f="$my_word" "$myfile"
This search for all line larger than line number 10 that contains word in variable $my_word in file name stored in variable myfile

sed + awk + verify line in file

I have the following example file
/etc/sysconfig/network/script.sh = -exe $Builder
run_installation 123 44 556 4 = run_installation arg1 arg2 arg3 948
EXE=somthing
EXE somthing
I have three questions (I write bash script)
how to verify by sed or awk if the string "-exe" exist after "=" character
how to verify by sed or awk if the string run_installation exist in the first of the line (the first word in the line) and after the "=" character as example below (file)
the string EXE in file can be "EXE" or as "EXE=" , how to delete by sed the EXE or EXE=
I do:
sed s'/EXE//g' | sed s'/EXE=//g'
but its not nice way to do in my bash script
• I need three different answers!
Lidia
you did not give further criteria on what to do if conditions 1 and 2 are not found...
awk '/=.*-exe/{f=1;}
/^run_installation.*=.*run_installation/{g=1}
/^EXE/{ gsub(/EXE=|EXE/,"") }
f && g{ print "ok" ;exit }
' file
The above code checks for condition 1 and condition 2 and print "ok" when both are found. The substitution of EXE for condition 3 is added for illustration purpose. State more clearly what you want to do and show your expected output next time
To verify them separately,
awk '/= -exe/{print "found"}' file
awk '/^run_installation.*=.*run_installation/{print "found"}' file

Count the number of occurrences of a string using sed?

I have a file which contains "title" written in it many times. How can I find the number of times "title" is written in that file using the sed command provided that "title" is the first string in a line? e.g.
# title
title
title
should output the count = 2 because in first line title is not the first string.
Update
I used awk to find the total number of occurrences as:
awk '$1 ~ /title/ {++c} END {print c}' FS=: myFile.txt
But how can I tell awk to count only those lines having title the first string as explained in example above?
Never say never. Pure sed (although it may require the GNU version).
#!/bin/sed -nf
# based on a script from the sed info file (info sed)
# section 4.8 Numbering Non-blank Lines (cat -b)
# modified to count lines that begin with "title"
/^title/! be
x
/^$/ s/^.*$/0/
/^9*$/ s/^/0/
s/.9*$/x&/
h
s/^.*x//
y/0123456789/1234567890/
x
s/x.*$//
G
s/\n//
h
:e
$ {x;p}
Explanation:
#!/bin/sed -nf
# run sed without printing output by default (-n)
# using the following file as the sed script (-f)
/^title/! be # if the current line doesn't begin with "title" branch to label e
x # swap the counter from hold space into pattern space
/^$/ s/^.*$/0/ # if pattern space is empty start the counter at zero
/^9*$/ s/^/0/ # if pattern space starts with a nine, prepend a zero
s/.9*$/x&/ # mark the position of the last digit before a sequence of nines (if any)
h # copy the marked counter to hold space
s/^.*x// # delete everything before the marker
y/0123456789/1234567890/ # increment the digits that were after the mark
x # swap pattern space and hold space
s/x.*$// # delete everything after the marker leaving the leading digits
G # append hold space to pattern space
s/\n// # remove the newline, leaving all the digits concatenated
h # save the counter into hold space
:e # label e
$ {x;p} # if this is the last line of input, swap in the counter and print it
Here are excerpts from a trace of the script using sedsed:
$ echo -e 'title\ntitle\nfoo\ntitle\nbar\ntitle\ntitle\ntitle\ntitle\ntitle\ntitle\ntitle\ntitle' | sedsed-1.0 -d -f ./counter
PATT:title$
HOLD:$
COMM:/^title/ !b e
COMM:x
PATT:$
HOLD:title$
COMM:/^$/ s/^.*$/0/
PATT:0$
HOLD:title$
COMM:/^9*$/ s/^/0/
PATT:0$
HOLD:title$
COMM:s/.9*$/x&/
PATT:x0$
HOLD:title$
COMM:h
PATT:x0$
HOLD:x0$
COMM:s/^.*x//
PATT:0$
HOLD:x0$
COMM:y/0123456789/1234567890/
PATT:1$
HOLD:x0$
COMM:x
PATT:x0$
HOLD:1$
COMM:s/x.*$//
PATT:$
HOLD:1$
COMM:G
PATT:\n1$
HOLD:1$
COMM:s/\n//
PATT:1$
HOLD:1$
COMM:h
PATT:1$
HOLD:1$
COMM::e
COMM:$ {
PATT:1$
HOLD:1$
PATT:title$
HOLD:1$
COMM:/^title/ !b e
COMM:x
PATT:1$
HOLD:title$
COMM:/^$/ s/^.*$/0/
PATT:1$
HOLD:title$
COMM:/^9*$/ s/^/0/
PATT:1$
HOLD:title$
COMM:s/.9*$/x&/
PATT:x1$
HOLD:title$
COMM:h
PATT:x1$
HOLD:x1$
COMM:s/^.*x//
PATT:1$
HOLD:x1$
COMM:y/0123456789/1234567890/
PATT:2$
HOLD:x1$
COMM:x
PATT:x1$
HOLD:2$
COMM:s/x.*$//
PATT:$
HOLD:2$
COMM:G
PATT:\n2$
HOLD:2$
COMM:s/\n//
PATT:2$
HOLD:2$
COMM:h
PATT:2$
HOLD:2$
COMM::e
COMM:$ {
PATT:2$
HOLD:2$
PATT:foo$
HOLD:2$
COMM:/^title/ !b e
COMM:$ {
PATT:foo$
HOLD:2$
. . .
PATT:10$
HOLD:10$
PATT:title$
HOLD:10$
COMM:/^title/ !b e
COMM:x
PATT:10$
HOLD:title$
COMM:/^$/ s/^.*$/0/
PATT:10$
HOLD:title$
COMM:/^9*$/ s/^/0/
PATT:10$
HOLD:title$
COMM:s/.9*$/x&/
PATT:1x0$
HOLD:title$
COMM:h
PATT:1x0$
HOLD:1x0$
COMM:s/^.*x//
PATT:0$
HOLD:1x0$
COMM:y/0123456789/1234567890/
PATT:1$
HOLD:1x0$
COMM:x
PATT:1x0$
HOLD:1$
COMM:s/x.*$//
PATT:1$
HOLD:1$
COMM:G
PATT:1\n1$
HOLD:1$
COMM:s/\n//
PATT:11$
HOLD:1$
COMM:h
PATT:11$
HOLD:11$
COMM::e
COMM:$ {
COMM:x
PATT:11$
HOLD:11$
COMM:p
11
PATT:11$
HOLD:11$
COMM:}
PATT:11$
HOLD:11$
The ellipsis represents lines of output I omitted here. The line with "11" on it by itself is where the final count is output. That's the only output you'd get when the sedsed debugger isn't being used.
Revised answer
Succinctly, you can't - sed is not the correct tool for the job (it cannot count).
sed -n '/^title/p' file | grep -c
This looks for lines starting title and prints them, feeding the output into grep to count them. Or, equivalently:
grep -c '^title' file
Original answer - before the question was edited
Succinctly, you can't - it is not the correct tool for the job.
grep -c title file
sed -n /title/p file | wc -l
The second uses sed as a surrogate for grep and sends the output to 'wc' to count lines. Both count the number of lines containing 'title', rather than the number of occurrences of title.
You could fix that with something like:
cat file |
tr ' ' '\n' |
grep -c title
The 'tr' command converts blanks into newlines, thus putting each space separated word on its own line, and therefore grep only gets to count lines containing the word title. That works unless you have sequences such as 'title-entitlement' where there's no space separating the two occurrences of title.
I don't think sed would be appropriate, unless you use it in a pipeline to convert your file so that the word you need appears on separate lines, and then use grep -c to count the occurrences.
I like Jonathan's idea of using tr to convert spaces to newlines. The beauty of this method is that successive spaces get converted to multiple blank lines but it doesn't matter because grep will be able to count just the lines with the single word 'title'.
just one gawk command will do. Don't use grep -c because it only counts line with "title" in it, regardless of how many "title"s there are in the line.
$ more file
# title
# title
one
two
#title
title title
three
title junk title
title
four
fivetitlesixtitle
last
$ awk '!/^#.*title/{m=gsub("title","");total+=m}END{print "total: "total}' file
total: 7
if you just want "title" as the first string, use "==" instead of ~
awk '$1 == "title"{++c}END{print c}' file
sed 's/title/title\n/g' file | grep -c title
This might work for you:
sed '/^title/!d' file | sed -n '$='