sed - find multiple phrases and replace them [duplicate]

sed - find multiple phrases and replace them [duplicate] - sed

This question already has an answer here:
replace multiple strings in one line with sed
(1 answer)
Closed 1 year ago.
Got sed command which search one phrase and if found then whole line is replaced.
sed 's/.*phrase.*/123/'
That works great but how to use multiple phrases and if even one of them is found then replace whole line?
Was trying command below but no success:
sed 's/.*phrase1|phrase2.*/123/'
Using GNU sed.

You need to use an alternation operator like this:
sed 's/.*\(phrase1\|phrase2\).*/123/' # POSIX BRE way
sed -E 's/.*(phrase1|phrase2).*/123/' # POSIX ERE way
In POSIX BRE, \(phrase1\|phrase2\) means a capturing group that matches either phrase1 or phrase2. \| is a GNU extension for an alternation operator in a POSIX BRE pattern.
In POSIX ERE (enabled with -E option), you need to remove the backslashes from the above mentioned constructs: (phrase1|phrase2).
Grouping is necessary as it makes the alternation apply only to the contructs grouped, i.e. .*phrase1\|phrase2.* would match either anything from the start till the last phrase1 string or phrase2 till the end of string.

You are almost there, you need to put the different phrases between parentheses. Also, if you want to replace all occurrences of your pattern you need to add g at the end.
You can take a look at the docs:
https://www.gnu.org/software/sed/manual/sed.html

Related

sed command to replace a value in a file not using find and replace

I have a file with a string log.txt and inside the file i have multiple lines
line 1 text
line2/random/string/version:0.0.30
line 3 randome stuff
http://someurl:8550/
So currently I use sed to find and replace 0.0.30 to a new value like 0.0.31
with
sed -i s/0.0.30/0.0.31/g log.txt
The problem with this is I need to know the previous value.
Is there a way to always remove 0.0.30 from the string in the file and replace it with a new value ?
Maybe a indexof or a substring.

You can use a regex definition to match 0.0.30 and replace it with 0.0.31 as below. The --posix flag is to ensure no GNU dialects are applied and plain BRE (Basic Regular Expressions) library is used. Since \{2\} is a BRE syntax to match 2 occurrences of the digit.
sed -i --posix 's/[[:digit:]]\.[[:digit:]]\.[[:digit:]]\{2\}/0.0.31/' file
See explanation for regex here.

How to use capture groups with sed?

I'm trying to replace some text in a file using sed but I'm having troubles.
sed -ir 's/(\$hello = )true/\1false/' /path/to/my/file.txt gives the error sed: -e expression #1, char 27: invalid reference \1 on 's' command's RHS.
I want to replace $hello = true with $hello = false, so in order to avoid typing $hello = twice I wanted to use capture groups - which isn't working.
What am I doing wrong?

You don't have to escape parentheses in extended regex mode, if it was your intent with the r into -ir, but actually if you want both options -i and -r then you have to keep them apart or use -ri instead of -ir because the latter interprets the part after -i as an optional backup suffix.
From sed manual
Because -i takes an optional argument, it should
not be followed by other short options:
sed -Ei '...' FILE
Same as -E -i with no backup suffix - FILE will be edited in-place without creating a backup.
sed -iE '...' FILE
This is equivalent to --in-place=E, creating FILEE as backup
of FILE

You must escape the parenthesis with backslashes \(...\), to be used as grouping.
See THE SED FAQ, section "3.1.2. Escape characters on the right side of "s///"" has an example:
3.1.2. Escape characters on the right side of "s///"
The right-hand side (the replacement part) in "s/find/replace/" is
almost always a string literal, with no interpolation of these
metacharacters:
. ^ $ [ ] { } ( ) ? + * |
Three things are interpolated: ampersand (&), backreferences, and
options for special seds. An ampersand on the RHS is replaced by
the entire expression matched on the LHS. There is never any
reason to use grouping like this:
s/\(some-complex-regex\)/one two \1 three/
And later in section "F. GNU sed v2.05 and higher versions":
F. GNU sed v2.05 and higher versions
...
Undocumented -r switch:
Beginning with version 3.02, GNU sed has an undocumented -r switch
(undocumented till version 4.0), activating Extended Regular
Expressions in the following manner:
? - 0 or 1 occurrence of previous character
+ - 1 or more occurrences of previous character
| - matches the string on either side, e.g., foo|bar
(...) - enable grouping without backslash
{...} - enable interval expression without backslash
When the -r switch (mnemonic: "regular expression") is used, prefix
these symbols with a backslash to disable the special meaning.
For documentation of regular expression syntax used in (GNU) sed, see Overview of basic regular expression syntax
5.3 Overview of basic regular expression syntax
...
\(regexp\)
Groups the inner regexp as a whole, this is used to:
Apply postfix operators, like (abcd)*: this will search for zero or more whole sequences of ‘abcd’, while abcd* would search for ‘abc’ followed by zero or more occurrences of ‘d’. Note that support for (abcd)* is required by POSIX 1003.1-2001, but many non-GNU implementations do not support it and hence it is not universally portable.
Use back references (see below).

Why is my sed multiline find-and-replace not working as expected?

I have a simple sed command that I am using to replace everything between (and including) //thistest.com-- and --thistest.com with nothing (remove the block all together):
sudo sed -i "s#//thistest\.com--.*--thistest\.com##g" my.file
The contents of my.file are:
//thistest.com--
zone "awebsite.com" {
type master;
file "some.stuff.com.hosts";
};
//--thistest.com
As I am using # as my delimiter for the regex, I don't need to escape the / characters. I am also properly (I think) escaping the . in .com. So I don't see exactly what is failing.
Why isn't the entire block being replaced?

You have two problems:
Sed doesn't do multiline pattern matches—at least, not the way you're expecting it to. However, you can use multiline addresses as an alternative.
Depending on your version of sed, you may need to escape alternate delimiters, especially if you aren't using them solely as part of a substitution expression.
So, the following will work with your posted corpus in both GNU and BSD flavors:
sed '\#^//thistest\.com--#, \#^//--thistest\.com# d' /tmp/corpus
Note that in this version, we tell sed to match all lines between (and including) the two patterns. The opening delimiter of each address pattern is properly escaped. The command has also been changed to d for delete instead of s for substitute, and some whitespace was added for readability.
I've also chosen to anchor the address patterns to the start of each line. You may or may not find that helpful with this specific corpus, but it's generally wise to do so when you can, and doesn't seem to hurt your use case.

# separation by line with 1 s//
sed -n -e 'H;${x;s#^\(.\)\(.*\)\1//thistest.com--.*\1//--thistest.com#\2#;p}' YourFile
# separation by line with address pattern
sed -e '\#//thistest.com--#,\#//--thistest.com# d' YourFile
# separation only by char (could be CR, CR/LF, ";" or "oneline") with s//
sed -n -e '1h;1!H;${x;s#//thistest.com--.*\1//--thistest.com##;p}' YourFile
Note:
assuming there is only 1 section thistest per file (if not, it remove anything between the first opening until the last closing section) for the use of s//
does not suite for huge file (load entire file into memory) with s//
sed using addresses pattern cannot select section on the same line, it search 1st pattern to start, and a following line to stop but very efficient on big file and/or multisection

Confining Substitution to Match Space Using sed?

Is there a way to substitute only within the match space using sed?
I.e. given the following line, is there a way to substitute only the "." chars that are contained within the matching single quotes and protect the "." chars that are not enclosed by single quotes?
Input:
'ECJ-4YF1H10.6Z' ! 'CAP' ! '10.0uF' ! 'TOL' ; MGCDC1008.S1 MGCDC1009.A2
Desired result:
'ECJ-4YF1H10-6Z' ! 'CAP' ! '10_0uF' ! 'TOL' ; MGCDC1008.S1 MGCDC1009.A2
Or is this just a job to which perl or awk might be better suited?
Thanks for your help,
Mark

Give the following a try which uses the divide-and-conquer technique:
sed "s/\('[^']*'\)/\n&\n/g;s/\(\n'[^.]*\)\.\([^']*Z'\)/\1-\2/g;s/\(\n'[^.]*\)\.\([^']*uF'\)/\1_\2/g;s/\n//g" inputfile
Explanation:
s/\('[^']*'\)/\n&\n/g - Add newlines before and after each pair of single quotes with their contents
s/\(\n'[^.]*\)\.\([^']*Z'\)/\1-\2/g - Using a newline and the single quotes to key on, replace the dot with a dash for strings that end in "Z"
s/\(\n'[^.]*\)\.\([^']*uF'\)/\1_\2/g - Using a newline and the single quotes to key on, replace the dot with a dash for strings that end in "uF"
s/\n//g - Remove the newlines added in the first step
You can restrict the command to acting only on certain lines:
sed "/foo/{s/\('[^']*'\)/\n&\n/g;s/\(\n'[^.]*\)\.\([^']*Z'\)/\1-\2/g;s/\(\n'[^.]*\)\.\([^']*uF'\)/\1_\2/g;s/\n//g}" inputfile
where you would substitute some regex in place of "foo".
Some versions of sed like to be spoon fed (instead of semicolons between commands, use -e):
sed -e "/foo/{s/\('[^']*'\)/\n&\n/g" -e "s/\(\n'[^.]*\)\.\([^']*Z'\)/\1-\2/g" -e "s/\(\n'[^.]*\)\.\([^']*uF'\)/\1_\2/g" -e "s/\n//g}" inputfile

$ cat phoo1234567_sedFix.sed
#! /bin/sed -f
/'[0-9][0-9]\.[0-9][a-zA-Z][a-zA-Z]'/s/'\([0-9][0-9]\)\.\([0-9][a-zA-Z][a-zA-Z]\)'/\1_\2/
This answers your specific question. If the pattern you need to fix isn't always like the example you provided, they you'll need multiple copies of this line, with reg-expressions modified to match your new change targets.
Note that the cmd is in 2 parts, "/'[0-9][0-9].[0-9][a-zA-Z][a-zA-Z]'/" says, must match lines with this pattern, while the trailing "s/'([0-9][0-9]).([0-9][a-zA-Z][a-zA-Z])'/\1_\2/", is the part that does the substitution. You can add a 'g' after the final '/' to make this substitution happen on all instances of this pattern in each line.
The \(\) pairs in match pattern get converted into the numbered buffers on the substitution side of the command (i.e. \1 \2). This is what gives sed power that awk doesn't have.
If your going to do much of this kind of work, I highly recommend O'Rielly's Sed And Awk book. The time spent going thru how sed works will be paid back many times.
I hope this helps.
P.S. as you appear to be a new user, if you get an answer that helps you please remember to mark it as accepted, or give it a + (or -) as a useful answer.

this is a job most suitable for awk or any language that supports breaking/splitting strings.
IMO, using sed for this task, which is regex based , while doable, is difficult to read and debug, hence not the most appropriate tool for the job. No offense to sed fanatics.
awk '{
for(i=1;i<=NF;i++) {
if ($i ~ /\047/ ){
gsub(".","_",$i)
}
}
}1' file
The above says for each field (field seperator by default is white space), check to see if there is a single quote, and if there is , substitute the "." to "_". This method is simple and doesn't need complicated regex.

Deleting multiline text from multiple files

I have a bunch of java files from which I want to remove the javadoc lines with the license [am changing it on my code].
The pattern I am looking for is
^\* \* ProjectName .* USA\.$
but matched across lines
Is there a way sed [or a commonly used editor in Windows/Linux] can do a search/replace for a multiline pattern?

Here's the appropriate reference point in my favorite sed tutorial.

Probably someone is still looking for such solution from time to time. Here is one.
Use awk to find the lines to be removed. Then use diff to remove the lines and let sed clean up.
awk "/^\* \* ProjectName /,/ USA\.$/" input.txt \
| diff - input.txt \
| sed -n -e"s/^> //p" \
>output.txt
A warning note: if the first pattern exist while the second does not, you will loose all text below the first pattern - so check that first.

Yes. Are you using sed, awk, perl, or something else to solve this problem?
Most regular expression tools allow you to specify multi-line patterns. Just be careful with regular expressions that are too greedy, or they'll match the code between comments if it exists.
Here's an example:
/\*(?:.|[\r\n])*?\*/
perl -0777ne 'print m!/\*(?:.|[\r\n])*?\*/!g;' <file>
Prints out all the comments run
together. The (?: notation must be
used for non-capturing parenthesis. /
does not have to be escaped because !
delimits the expression. -0777 is used
to enable slurp mode and -n enables
automatic reading.
(From: http://ostermiller.org/findcomment.html )

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

sed - find multiple phrases and replace them [duplicate] - sed

You are almost there, you need to put the different phrases between parentheses. Also, if you want to replace all occurrences of your pattern you need to add g at the end. You can take a look at the docs: https://www.gnu.org/software/sed/manual/sed.html

Related

sed command to replace a value in a file not using find and replace

How to use capture groups with sed?

Why is my sed multiline find-and-replace not working as expected?

Confining Substitution to Match Space Using sed?

Deleting multiline text from multiple files

Categories

Resources