Replace word starting with a - sed

I'm trying to understand sed. I have to replace every word starting with an a or A in a lipsum.txt with the word foobar. I started with cat lipsum.txt | sed 's/\ba/foobar/g' which already works, but it only replaces the a and not the entire word.
I read about using cat lipsum.txt | sed 's/\ba\w+/foobar/g' which targets the entire word. But it just doesn't replace anything. Same with cat lipsum.txt | sed 's/\ba[A-Z]*/foobar/g', it just leaves the text untouched. What am I doing wrong?
Also if I substitute the a with "(a|A)" to be case-insensitive it also just stops working.

With GNU sed, use
sed -i 's/\b[aA][[:alpha:]]*/foobar/g' file
Or,
sed -i 's/\b[aA][[:alnum:]]*/foobar/g' file
The \b[aA] matches a word starting with a or A and [[:alnum:]]* matches zero or more alphanumeric characters. [[:alpha:]]* matches zero or more letters.

Related

sed command to replace a value in a file not using find and replace

I have a file with a string log.txt and inside the file i have multiple lines
line 1 text
line2/random/string/version:0.0.30
line 3 randome stuff
http://someurl:8550/
So currently I use sed to find and replace 0.0.30 to a new value like 0.0.31
with
sed -i s/0.0.30/0.0.31/g log.txt
The problem with this is I need to know the previous value.
Is there a way to always remove 0.0.30 from the string in the file and replace it with a new value ?
Maybe a indexof or a substring.
You can use a regex definition to match 0.0.30 and replace it with 0.0.31 as below. The --posix flag is to ensure no GNU dialects are applied and plain BRE (Basic Regular Expressions) library is used. Since \{2\} is a BRE syntax to match 2 occurrences of the digit.
sed -i --posix 's/[[:digit:]]\.[[:digit:]]\.[[:digit:]]\{2\}/0.0.31/' file
See explanation for regex here.

Substring file name in Unix using sed command

I want to substring the File name in unix using sed command.
File name : Test_Test1_Test2_10082019_030013.csv.20191008-075740
I want the characters after the 3rd underscore or (all the characters after Test2 ) i need to be printed .
Can this be done using sed command?
I have tried this command
sed 's/^.*_\([^_]*\)$/\1/' <<< 'Test_Test1_Test2_10082019_030013.csv.20191008-075740'
but this is giving result as 030013.csv.20191008-075740
I need it from 10082019_030013.csv.20191008-075740
Thanks
Neha
To remove from the beginning up to including the 3rd underscore you can use
sed 's/^\([^_]*_\)\{3\}//' <<< 'Test_Test1_Test2_10082019_030013.csv.20191008-075740'
This removes the initial part that consists of 3 groups of (any number of non-underscore characters followed by an underscore). The result is
10082019_030013.csv.20191008-075740
If you use GNU sed you can switch it to extended regular expressions and omit the backslashes.
sed -r 's/^([^_]*_){3}//' <<< 'Test_Test1_Test2_10082019_030013.csv.20191008-075740'
Could you please try following.
sed 's/\([^_]*\)_\([^_]*\)_\([^_]*\)_\(.*\)/\4/' Input_file
Or as per Bodo's nice suggestion:
sed 's/[^_]*_[^_]*_[^_]_\(.*\)/\1/' Input_file
This might work for you (GNU sed):
sed 's/_/\n/3;s/.*\n//;t;s/Test2/\n/;s/.*\n//;t;d' file
Replace the third _ by a newline and then remove everything upto and including the first newline. If this succeeds, bail out and print the result. Otherwise, try the same method with Test2 and if this fails delete the entire line.

Select specific items from a file using sed

I'm very much a junior when it comes to the sed command, and my Bruce Barnett guide sits right next to me, but one thing has been troubling me. With a file, can you filter it using sed to select only specific items? For example, in the following file:
alpha|november
bravo|october
charlie|papa
alpha|quebec
bravo|romeo
charlie|sahara
Would it be possible to set a command to return only the bravos, like:
bravo|october
bravo|romeo
With sed:
sed '/^bravo|/!d' filename
Alternatively, with grep (because it's sort of made for this stuff):
grep '^bravo|' filename
or with awk, which works nicely for tabular data,
awk -F '|' '$1 == "bravo"' filename
The first two use a regular expression, selecting those lines that match it. In ^bravo|, ^ matches the beginning of the line and bravo| the literal string bravo|, so this selects all lines that begin with bravo|.
The awk way splits the line across the field separator | and selects those lines whose first field is bravo.
You could also use a regex with awk:
awk '/^bravo|/' filename
...but I don't think this plays to awk's strengths in this case.
Another solution with sed:
sed -n '/^bravo|/p' filename
-n option => no printing by default.
If line begins with bravo|, print it (p)
2 way (at least) with sed
removing unwanted line
sed '/^bravo\|/ !d' YourFile
Printing only wanted lines
sed -n '/^bravo\|/ p' YourFile
if no other constraint or action occur, both are the same and a grep is better.
If there will be some action after, it could change the performance where a d cycle directly to the next line and a p will print then continue the following action.
Note the escape of pipe is needed for GNU sed, not on posix version

Matching strings even if they start with white spaces in SED

I'm having issues matching strings even if they start with any number of white spaces. It's been very little time since I started using regular expressions, so I need some help
Here is an example. I have a file (file.txt) that contains two lines
#String1='Test One'
String1='Test Two'
Im trying to change the value for the second line, without affecting line 1 so I used this
sed -i "s|String1=.*$|String1='Test Three'|g"
This changes the values for both lines. How can I make sed change only the value of the second string?
Thank you
With gnu sed, you match spaces using \s, while other sed implementations usually work with the [[:space:]] character class. So, pick one of these:
sed 's/^\s*AWord/AnotherWord/'
sed 's/^[[:space:]]*AWord/AnotherWord/'
Since you're using -i, I assume GNU sed. Either way, you probably shouldn't retype your word, as that introduces the chance of a typo. I'd go with:
sed -i "s/^\(\s*String1=\).*/\1'New Value'/" file
Move the \s* outside of the parens if you don't want to preserve the leading whitespace.
There are a couple of solutions you could use to go about your problem
If you want to ignore lines that begin with a comment character such as '#' you could use something like this:
sed -i "/^\s*#/! s|String1=.*$|String1='Test Three'|g" file.txt
which will only operate on lines that do not match the regular expression /.../! that begins ^ with optional whiltespace\s* followed by an octothorp #
The other option is to include the characters before 'String' as part of the substitution. Doing it this way means you'll need to capture \(...\) the group to include it in the output with \1
sed -i "s|^\(\s*\)String1=.*$|\1String1='Test Four'|g" file.txt
With GNU sed, try:
sed -i "s|^\s*String1=.*$|String1='Test Three'|" file
or
sed -i "/^\s*String1=/s/=.*/='Test Three'/" file
Using awk you could do:
awk '/String1/ && f++ {$2="Test Three"}1' FS=\' OFS=\' file
#String1='Test One'
String1='Test Three'
It will ignore first hits of string1 since f is not true.

sed to replace only matching part in search string

I have a file that contains:
Lorem ipsum dolem file1.jar.
file1.jar (MD5: 12345678901234567890123456789012)
file2.jar (MD5: 09876543210987654321098765432109)
file3.jar (MD5: 24681357902468135790246813579024)
and I'd like to replace the first MD5. This sed command does the job:
sed "s/file1.*MD5\:\(.*\)/file1.jar \(MD5\: `md5 file1.jar | awk '{print $4}'`\)/"
Is there a way to tell sed to replace only the matching group while leaving the rest of the line alone? For example:
sed "s/file1.*MD5\:\(.*\)/`md5 file1.jar | awk '{print $4}'`/"
You can use a search to specify the line to match, and then a simpler regex in the substitute:
sed "/file1\.jar (MD5: [0-9A-Fa-f]*)/s/(MD5: [^)]*)/(MD5: $(md5 file1.jar | awk '{print $4}'))/"
That uses the $(...) notation to run the command. The tricky bit in that is at the end, where the sequence ))/" appears. The first close parenthesis is the end of the $(...) notation; the second is a character in the replacement text.
The first regex /file1\.jar (MD5: [0-9A-Fa-f]*)/ specifies fairly precisely the line to be matched. Then, knowing it is the correct line, the pattern in the substitute can be simpler: the search part /(MD5: [^)]*)/ looks for just the parenthesized MD5 data, safe in the knowledge that even though many other lines contain the same pattern, the substitution will only be applied to the one desired line.
I might be inclined to use:
md5=$(md5 file1.jar | awk '{print $4}')
sed "/file1\.jar (MD5: [0-9A-Fa-f]*)/ s/(MD5: [^)]*)/(MD5: $md5)/"
which clarifies what's what considerably (and doesn't involve a horizontal scroll bar on SO). You could be even more precise in the line matching pattern:
md5=$(md5 file1.jar | awk '{print $4}')
sed "/^file1\.jar (MD5: [0-9A-Fa-f]\{32\})\$/ s/(MD5: [^)]*)/(MD5: $md5)/"
That insists on exactly 32 hex digits and the close parenthesis at the end of the line.
One of the comments asks:
Can sed operate in such a way that the replacement string replaces only the matching groups in the search pattern? For example, given 's/A B \(D\)/C/', it outputs A B C.
If I understand the (clarification of the) question, then you can do what you want with appropriate capturing - but the replacement part will have to specify exactly what you want as output (no shortcuts like you seem to be after). So, for the example, you would write something like:
s/\(A B \)\(D\)/\1C/
(where the capturing \(D\) does not need the capturing parentheses since the captured material is not used in the replacement, and you could write either of:
s/\(A B \)D/\1C/
s/\(A B\) D/\1 C/
You could also do:
/A B / s/D/C/
This has a search (for the A B sequence) and then the substitute looks for D and replaces it with C. This is basically what the main answer is suggesting. You can probably also do:
/\(A B\) D/ s//\1 C/
The 'empty search' should repeat the match, but the replacement has to be written out in full, and that is effectively the same as one of the previous commands:
s/\(A B\) D/\1 C/
This should do it (untested):
sed "s/(file1.jar \(MD5: )(..............................)/\1`md5 file1.jar | awk '{print $1}'`/"
That's 32 dots, mind you.