I am using sed to replace 14 different abbreviations like CA_23456, CB_scaffold34532,... with 'proper' names in a file and it works putting it all on one line.
acc=$1
sed -e 's/CA_[A-Z]*[a-z]*[0-9]*/Hesperocyparis_arizonica/;s/CB_[A-Z]*[a-z]*[0-9]*/Hesperocyparis_bakeri/;s/CM_[A-Z]*[a-z]*[0-9]*/Hesperocyparis_macrocarpa/;s/CS_[A-Z]*[a-z]*[0-9]*/Cupressus_sempervirens/;s/CT_[A-Z]*[a-z]*[0-9]*/Cupressus_torulosa/;s/JD_[A-Z]*[a-z]*[0-9]*/Juniperus_drupacea/;s/JF_[A-Z]*[a-z]*[0-9]*/Juniperus_flaccida/;s/JI_[A-Z]*[a-z]*[0-9]*/Juniperus_indica/;s/JP_[A-Z]*[a-z]*[0-9]*/Juniperus_phoenicea/;s/JX_[A-Z]*[a-z]*[0-9]*/Juniperus_procera/;s/JS_[A-Z]*[a-z]*[0-9]*/Juniperus_scopulorum/;s/MD_[A-Z]*[a-z]*[0-9]*/Microbiota_decussata/;s/XN_[A-Z]*[a-z]*[0-9]*/Xanthocyparis_nootkatensis/;s/XV_[A-Z]*[a-z]*[0-9]*/Xanthocyparis_vietnamensis/' ${acc}.nex > ${acc}_replaced.nex
To make it more readable I'd like to have the command split over multiple lines using '\' (not all the replacements are shown for brevity)
acc=$1
sed -e 's/CA_[A-Z]*[a-z]*[0-9]*/Hesperocyparis_arizonica/;\
s/CB_[A-Z]*[a-z]*[0-9]*/Hesperocyparis_bakeri/;\
s/CM_[A-Z]*[a-z]*[0-9]*/Hesperocyparis_macrocarpa/'\
${acc}.nex > ${acc}_replaced.nex
However, I get an error message: sed: -e expression #1, char 168: unterminated address regex. I have looked at the answers to similar problems on various webforums and tried various things (using 's/.../.../' on every line, leaving ';' out,....) but I can't get it to work. What am I doing wrong?
Drop the \ that escapes the newlines. (They are not actually doing it!, they are interpreted as wrong syntax by sed). However I would suggest to put it into a file and run it like this:
sed -f script.sed input
where script.sed looks like this:
s/CA_[A-Z]*[a-z]*[0-9]*/Hesperocyparis_arizonica/
s/CB_[A-Z]*[a-z]*[0-9]*/Hesperocyparis_bakeri/
s/CM_[A-Z]*[a-z]*[0-9]*/Hesperocyparis_macrocarpa/
Remove the backslashes from the sed code.
Inside singly-quoted shell strings, backslashes are not needed to escape newlines and are not removed because they are not parsed as escape characters. This has the effect that sed sees them as part of its code, and it then expects to find an address regex with a different delimiter than / before the command ends at the next newline (similar to \,/home/, !d). This address regex does not appear (nor an associated command), and so sed complains about invalid code.
Apart from that: The semicolons in the sed code are no longer necessary when you terminate commands with newlines, and anything involving shell variables should be quoted to avoid splitting in case of whitespace.
In sum:
sed -e 's/CA_[A-Z]*[a-z]*[0-9]*/Hesperocyparis_arizonica/
s/CB_[A-Z]*[a-z]*[0-9]*/Hesperocyparis_bakeri/
s/CM_[A-Z]*[a-z]*[0-9]*/Hesperocyparis_macrocarpa/' \
"${acc}.nex" > "${acc}_replaced.nex"
Related
The goal is to use sed to return only the url from each line of FF extension Mining Blocker which uses this format for its regex lines:
{"baseurl":"*://002.0x1f4b0.com/*", "suburl":"*://*/002.0x1f4b0.com/*"},
{"baseurl":"*://003.0x1f4b0.com/*", "suburl":"*://*/003.0x1f4b0.com/*"},
the result should be:
002.0x1f4b0.com
003.0x1f4b0.com
One way would be to keep everything after suburl":"*://*/ then remove each occurrence of /*"},
I found https://unix.stackexchange.com/questions/24140/return-only-the-portion-of-a-line-after-a-matching-pattern but the special characters are a problem.
this won't work:
sed -n -e s#^.*suburl":"*://*/##g hosts
Would someone please show me how to mark the 2 asterisks in the string so they are seen by regex as literal characters, not wildcards?
edit:
sed -n 's#.*://\*/\([^/]\+\)/.*#\1#p' hosts
doesn't work, unfortunately.
regarding character substitution, thanks for directing me to the references.
I reduced the searched-for string to //*/ and used ASCII character codes like this:
sed -n -e s#^.*\d047\d047\d042\d047##g hosts
Unfortunately, that didn't output any changes to the lines.
My assumptions are:
^.*something specifies everything up to and including the last occurrence of "something" in a line
sed -n -e s#search##g deletes (replace with nothing) "search" within a line
So, this line:
sed -n -e s#^.*\d047\d047\d042\d047##g hosts
Should output everything after //*/ in each line...except it doesn't.
What is incorrect with that line?
Regarding deleting everything including and after the first / AFTER that first operation, yes, that's wanted too.
This might work for you (GNU sed):
sed -n 's#.*://\*/\([^/]\+\)/.*#\1#p' file
Match greedily (the longest string that matches) all characters up to ://*/, followed by a group of characters (which will be referred to as \1) that do not match a /, followed by the rest of the line and replace it by the group \1.
N.B. the sed substitution delimiters are arbitrary, in this case chosen to be # so as make pattern matching / easier. Also the character * on the left hand side of the substitution command may be interpreted as a meta character that means zero or more of the previous character/group and so is quoted \* so that it does not mistakenly exert this property. Finally, using the option -n toggles off the usual printing of every thing in the pattern space after all the sed commands have been executed. The p flag on the substitution command, prints the pattern space following a successful substitution, therefore only URL's will appear in the output or nothing.
I'm using Papers3 to export a Bibtex library, but there are errors with some of the accents. Many come out missing a curly brace, and therefore give a compilation error when I run bibtex. Here is an example of an error in the bib file:
author = {Combi, J A and Rib{\'o}, M and Mart{\'{\i}, J and Chaty, S.},
I want to replace all instances of these in my file (agn.bib), using something like:
sed "s/{\'{\i}/{\'i}/g" agn.bib
But that doesn't do anything, and I can't find the answer on Stack Overflow how to do it.
You'll have to escape the backslash twice; once for the shell, once for the sed:
sed -i "s/{\\\\'{\\\\i}/{\\\\'i}/g" file
since backslash is a metacharacter for both.
When you say sed "\\\\", due to double quotes, sed actually receives \\, and according to the rules of basic regular expressions, treats the backslash as a literal character.
This might work for you (GNU sed):
sed -i 's/{\\'\''{\\i}/{\\'\''i}/g' file
Replace a \ by quoting it i.e. \\ and match a' by passing it through to the shell i.e. '\''.
I have a simple sed command that I am using to replace everything between (and including) //thistest.com-- and --thistest.com with nothing (remove the block all together):
sudo sed -i "s#//thistest\.com--.*--thistest\.com##g" my.file
The contents of my.file are:
//thistest.com--
zone "awebsite.com" {
type master;
file "some.stuff.com.hosts";
};
//--thistest.com
As I am using # as my delimiter for the regex, I don't need to escape the / characters. I am also properly (I think) escaping the . in .com. So I don't see exactly what is failing.
Why isn't the entire block being replaced?
You have two problems:
Sed doesn't do multiline pattern matches—at least, not the way you're expecting it to. However, you can use multiline addresses as an alternative.
Depending on your version of sed, you may need to escape alternate delimiters, especially if you aren't using them solely as part of a substitution expression.
So, the following will work with your posted corpus in both GNU and BSD flavors:
sed '\#^//thistest\.com--#, \#^//--thistest\.com# d' /tmp/corpus
Note that in this version, we tell sed to match all lines between (and including) the two patterns. The opening delimiter of each address pattern is properly escaped. The command has also been changed to d for delete instead of s for substitute, and some whitespace was added for readability.
I've also chosen to anchor the address patterns to the start of each line. You may or may not find that helpful with this specific corpus, but it's generally wise to do so when you can, and doesn't seem to hurt your use case.
# separation by line with 1 s//
sed -n -e 'H;${x;s#^\(.\)\(.*\)\1//thistest.com--.*\1//--thistest.com#\2#;p}' YourFile
# separation by line with address pattern
sed -e '\#//thistest.com--#,\#//--thistest.com# d' YourFile
# separation only by char (could be CR, CR/LF, ";" or "oneline") with s//
sed -n -e '1h;1!H;${x;s#//thistest.com--.*\1//--thistest.com##;p}' YourFile
Note:
assuming there is only 1 section thistest per file (if not, it remove anything between the first opening until the last closing section) for the use of s//
does not suite for huge file (load entire file into memory) with s//
sed using addresses pattern cannot select section on the same line, it search 1st pattern to start, and a following line to stop but very efficient on big file and/or multisection
I have one file named `config_3_setConfigPW.ldif? containing the following line:
{pass}
on terminal, I used following commands
SLAPPASSWD=Pwd&0011
sed -i "s#{pass}#$SLAPPASSWD#" config_3_setConfigPW.ldif
It should replace {pass} to Pwd&0011 but it generates Pwd{pass}0011.
The reason is that the SLAPPASSWD shell variable is expanded before sed sees it. So sed sees:
sed -i "s#{pass}#Pwd&0011#" config_3_setConfigPW.ldif
When an "&" is on the right hand side of a pattern it means "copy the matched input", and in your case the matched input is "{pass}".
The real problem is that you would have to escape all the special characters that might arise in SLAPPASSWD, to prevent sed doing this. For example, if you had character "#" in the password, sed would think it was the end of the substitute command, and give a syntax error.
Because of this, I wouldn't use sed for this. You could try gawk or perl?
eg, this will print out the modified file in awk (though it still assumes that SLAPPASSWD contains no " character
awk -F \{pass\} ' { print $1"'${SLAPPASSWD}'"$2 } ' config_3_setConfigPW.ldif
That's because$SLAPPASSWD contains the character sequences & which is a metacharacter used by sed and evaluates to the matched text in the s command. Meaning:
sed 's/{pass}/match: &/' <<< '{pass}'
would give you:
match: {pass}
A time ago I've asked this question: "Is it possible to escape regex metacharacters reliably with sed". Answers there show how to reliably escape the password before using it as the replacement part:
pwd="Pwd&0011"
pwdEscaped="$(sed 's/[&/\]/\\&/g' <<< "$pwd")"
# Now you can safely pass $pwd to sed
sed -i "s/{pass}/$pwdEscaped/" config_3_setConfigPW.ldif
Bear in mind that sed NEVER operates on strings. The thing sed searches for is a regexp and the thing it replaces it with is string-like but has some metacharacters you need to be aware of, e.g. & or \<number>, and all of it needs to avoid using the sed delimiters, / typically.
If you want to operate on strings you need to use awk:
awk -v old="{pass}" -v new="$SLAPPASSWD" 's=index($0,old){ $0 = substr($0,1,s-1) new substr($0,s+length(old))} 1' file
Even the above would need tweaked if old or new contained escape characters.
I'm having issues matching strings even if they start with any number of white spaces. It's been very little time since I started using regular expressions, so I need some help
Here is an example. I have a file (file.txt) that contains two lines
#String1='Test One'
String1='Test Two'
Im trying to change the value for the second line, without affecting line 1 so I used this
sed -i "s|String1=.*$|String1='Test Three'|g"
This changes the values for both lines. How can I make sed change only the value of the second string?
Thank you
With gnu sed, you match spaces using \s, while other sed implementations usually work with the [[:space:]] character class. So, pick one of these:
sed 's/^\s*AWord/AnotherWord/'
sed 's/^[[:space:]]*AWord/AnotherWord/'
Since you're using -i, I assume GNU sed. Either way, you probably shouldn't retype your word, as that introduces the chance of a typo. I'd go with:
sed -i "s/^\(\s*String1=\).*/\1'New Value'/" file
Move the \s* outside of the parens if you don't want to preserve the leading whitespace.
There are a couple of solutions you could use to go about your problem
If you want to ignore lines that begin with a comment character such as '#' you could use something like this:
sed -i "/^\s*#/! s|String1=.*$|String1='Test Three'|g" file.txt
which will only operate on lines that do not match the regular expression /.../! that begins ^ with optional whiltespace\s* followed by an octothorp #
The other option is to include the characters before 'String' as part of the substitution. Doing it this way means you'll need to capture \(...\) the group to include it in the output with \1
sed -i "s|^\(\s*\)String1=.*$|\1String1='Test Four'|g" file.txt
With GNU sed, try:
sed -i "s|^\s*String1=.*$|String1='Test Three'|" file
or
sed -i "/^\s*String1=/s/=.*/='Test Three'/" file
Using awk you could do:
awk '/String1/ && f++ {$2="Test Three"}1' FS=\' OFS=\' file
#String1='Test One'
String1='Test Three'
It will ignore first hits of string1 since f is not true.