sed behaves differently when getting script from file - sed

I need to extract strings that may contain the \n sequence from a file.
Note that in this case \n is not a newline, but simply the sequence of two characters: a backslash followed by a lower-case n.
As an example, if I type:
echo garbage "useful \n string" garbage | sed -e s/.*\(\"[^\"]*\"\).*/\1/
the result is correctly:
"useful \n string"
If instead I save the sed command to a file (b.txt) and type:
echo garbage "useful \n string" garbage | sed -f b.txt
which should do the same thing, instead I don't see any match, and the result is:
garbage "useful \n string" garbage
Note that the file contains just the sed command; the input string, still comes from the stdin.
I am using GNU sed version 4.2.1 in a Windows 10 command line window.
Any suggestion?

Your quoting and escape sequences look suspicious try this:
$ echo 'garbage "useful \n string" garbage' | sed 's/.*\("[^"]*"\).*/\1/'
"useful \n string"
$ cat b.txt
s/.*\("[^"]*"\).*/\1/
$ echo 'garbage "useful \n string" garbage' | sed -f b.txt
"useful \n string"

use pair '' and "" correctly and -E instead of -e option in gnu sed,
$ echo 'garbage "useful \n string" garbage' | sed -E 's/.*("[^"]*").*/\1/'
"useful \n string"
$ cat b.txt
s/.*("[^"]*").*/\1/
$ echo 'garbage "useful \n string" garbage' | sed -Ef b.txt
"useful \n string"

Related

Removing repeated characters with sed command

How to remove repeated characters or symbols in a string
some text\n\n\n some other text\n\n more text\n
How can I make something like this using sed or another command?
some text\n some other text\n more text\n
I can remove \n like sed s/\n//g but this will remove all the characters.
You can use
sed '/^$/d' file > newfile
In GNU sed, you can use inline replacement with -i option:
sed -i '/^$/d' file
In MacOS, FreeBSD sed inline replacement can be done with
sed -i '' '/^$/d' file
sed -i.bak '/^$/d' file
See the online demo:
#!/bin/bash
s=$(echo -e "some text\n\n\n some other text\n\n more text\n")
sed '/^$/d' <<< "$s"
Output:
some text
some other text
more text
You can also use tr if it supports squeezings.
$ echo -e 'ab\n\ncd' | tr --squeeze-repeats '\n'
ab
cd
Given the following [input] or a file that is similar:
printf "some text\n\n\n some other text\n\n more text\n" | [ one of the pipes below... ]
Any of these work:
[input] | sed -n '/[^[:space:]]/p'
Or:
[input] | sed '/^$/d'
Or, if you want to filter ^[spaces or tabs]\n also:
[input] | sed '/^[[:blank:]]*$/d'
Or with awk:
[input] | awk 'NF'

Grep Text between pattern and space

I have data in below format.
sometext NAME=abc TIME_TAKEN
sometext NAME=xyz{123} REQUEST
I am trying to grep the data between Name and space . I want output in below format
abc
xyz{123}
I tried using sed along with cat like below but it is not working .
cat test.txt | sed 's/NAME=\(.*\)"\+[[:space:]]\+"/\1/g'
awk 'sub(/.*NAME= +/,"") && sub(/ +.*/,"")' test.txt
Regards
Match everything before and after the assignment:
sed -ne 's/.* NAME=\(.*\) .*/\1/p' test.txt
Beware! It will extract the last NAME=... from each line.
This might work for you (GNU sed):
sed -En 's/.*\<NAME=(\S*).*/\1/p' file
Turn off implicit printing -n and turn on extended regexp -E.
Match a word NAME, followed by = and then any number of non-white spaced characters and return the any number of non-white spaced characters.
N.B. The replacement must replace the entire line, hence the .* at the start and end of the regexp.
Try the following command
cat test.txt | sed 's#.*NAME=\(.* \).*#\1#'
Explanation:
cat test.txt | sed 's# .*NAME= \(.* \) .* #\1#'
cat fileName | sed 's# All upto NAME= extract(upto any space) skip all

Escape line beginning and end in bracket expressions in sed

How do you escape line beginning and line end in bracket expressions in sed?
For example, let's say I want to replace both comma, line beginning, and line end in each line with pipe:
echo "a,b,c" | sed 's/,/|/g'
# a|b|c
echo "a,b,c" | sed 's/^/|/g'
# |a,b,c
echo "a,b,c" | sed 's/$/|/g'
# a,b,c|
echo "a,b,c" | sed 's/[,^$]/|/g'
# a|b|c
I would expect the last command to produce |a|b|c|. I also tried escaping the line beginning and line end via backslash, with no change.
With GNU sed with extended regular expressions, you can do:
$ echo "a,b,c" | /opt/gnu/bin/sed -E 's/^|,|$/|/g'
|a|b|c|
$
The -E option enables the extended regular expressions, as does -r, but -E is also used by other sed variants for the same purpose, unlike -r.
However, for reasons which elude me, the BSD (macOS) variant of sed produces:
$ echo "a,b,c" | sed -E 's/^|,|$/|/g'
|a|b|c
$
I can't think why.
If this variability is unacceptable, go with the three-substitution solution:
$ echo "a,b,c" | sed -e "s/^/|/" -e "s/$/|/" -e "s/,/|/g"
|a|b|c|
$
which should work with any variant of sed. However, note that echo "" | sed …3 subs… produces || whereas the -E variant produces |. I'm not sure if there's an easy fix for that.
You tried this, but it didn't do what you wanted:
$ echo "a,b,c" | sed 's/[,^$]/|/g'
a|b|c
$
This is what should be expected. Inside character classes, most special characters lose their special-ness. There is nothing special about $ (or , but it isn't a metacharacter anyway) in a character class; ^ is only special at the start of the class and it negates the character class. That means that what follows shows the correct, expected behaviour from this permutation of the contents of your character class:
$ echo "a,b\$\$b,c" | sed 's/[^,$]/|/g'
|,|$$|,|
$
It mapped all the non-comma, non-dollar characters to pipes. I should be using single quotes around the echo; then the backslashes wouldn't be necessary. I just followed the question's code quietly.
Following sed may help you in same.
echo "a,b,c" | sed 's/^/|/;s/,/|/g;s/$/|/'
Output will be as follows.
|a|b|c|

Verbatim Match with sed

I have a list of pairs of URLs - I want to find all occurrences of the first element of the pair and replace them with the second. I'm trying to use sed for this but sed escapes characters in my URL. Is there a way to make sed find these URLs (without changing my pairs)?
Here's my code:
while read -r NAME
do
ARG1=`echo "$NAME" | awk '{print $1}'`
ARG2=`echo "$NAME" | awk '{print $2}'`
echo "$ARG1"
echo "$ARG2"
sed -i "s#$ARG1#$ARG2#g" file
done < pagetable
pagetable has the pairs of URLS, and I'm doing the find and replace in 'file'. Since my URLs have special characters, sed isn't interpreting them verbatim.
Replace the metacharacters in the search pattern (\ * ^ $ . /) and in the replacement string (& /) before invoking sed. This assumes that the script is run by Bash.
ARG1="${ARG1//\\/\\\\}"
ARG1="${ARG1//\*/\\\*}"
ARG1="${ARG1//\//\\/}"
for mc in \^ \$ \.; do ARG1="${ARG1//$mc/\\$mc}"; done
ARG2="${ARG2//\\/\\\\}"
ARG2="${ARG2//\//\\/}"
ARG2="${ARG2//&/\\&}"
sed -i "s/$ARG1/$ARG2/g" file

How to replace only last match in a line with sed?

With sed, I can replace the first match in a line using
sed 's/pattern/replacement/'
And all matches using
sed 's/pattern/replacement/g'
How do I replace only the last match, regardless of how many matches there are before it?
Copy pasting from something I've posted elsewhere:
$ # replacing last occurrence
$ # can also use sed -E 's/:([^:]*)$/-\1/'
$ echo 'foo:123:bar:baz' | sed -E 's/(.*):/\1-/'
foo:123:bar-baz
$ echo '456:foo:123:bar:789:baz' | sed -E 's/(.*):/\1-/'
456:foo:123:bar:789-baz
$ echo 'foo and bar and baz land good' | sed -E 's/(.*)and/\1XYZ/'
foo and bar and baz lXYZ good
$ # use word boundaries as necessary - GNU sed
$ echo 'foo and bar and baz land good' | sed -E 's/(.*)\band\b/\1XYZ/'
foo and bar XYZ baz land good
$ # replacing last but one
$ echo 'foo:123:bar:baz' | sed -E 's/(.*):(.*:)/\1-\2/'
foo:123-bar:baz
$ echo '456:foo:123:bar:789:baz' | sed -E 's/(.*):(.*:)/\1-\2/'
456:foo:123:bar-789:baz
$ # replacing last but two
$ echo '456:foo:123:bar:789:baz' | sed -E 's/(.*):((.*:){2})/\1-\2/'
456:foo:123-bar:789:baz
$ # replacing last but three
$ echo '456:foo:123:bar:789:baz' | sed -E 's/(.*):((.*:){3})/\1-\2/'
456:foo-123:bar:789:baz
Further Reading:
Buggy behavior if word boundaries is used inside a group with quanitifiers - for example: echo 'it line with it here sit too' | sed -E 's/with(.*\bit\b){2}/XYZ/' fails
Greedy vs. Reluctant vs. Possessive Quantifiers
Reference - What does this regex mean?
sed manual: Back-references and Subexpressions
This might work for you (GNU sed):
sed 's/\(.*\)pattern/\1replacement/' file
Use greed to swallow up the pattern space and then regexp engine will step back through the line and find the first match i.e. the last match.
A fun way to do this, is to use rev to reverse the characters of each line and write your sed replacement backwards.
rev input_file | sed 's/nrettap/tnemecalper/' | rev