Sed remove multiple characters

Sed remove multiple characters - sed

I would like to replace multiple characters
echo "R \e&p[%20])l(a/ce" | sed 's|%20|-|g;s|\[||g;s|]||g;s| ||g;s|#||g;s|/||g;s|)||g;s|(||g;s|&||g;s|\\||g'
Rep-lace
Is there another way of doing so or is this it?
Replace %20 with - and the rest with nothing

I'd use
echo "R \e&p[%20])l(a/ce" | sed 's/%20/-/g; s/[][ #/()&\\]//g'
Because the character set is easier to extend that way. The thing to know is that ] has to be the first character in the set to be recognized as part of the set rather than the closing bracket.
Depending on what exactly it is you want to do, it may be worth a thought to invert the character set instead and replace everything but a specified number of characters. For example:
echo "R \e&p[%20])l(a/ce" | sed 's/%20/-/g; s/[^-[:alnum:]]//g'
This will replace %20 with - and then remove all characters except - and alphanumeric characters.

In Bash you can use in Parameter Expansion + sed:
bash$ STR="R \e&p[%20])l(a/ce"
bash$ echo "${STR/"%20"/-}" | sed -r 's/[^a-z-]//gi'
Rep-lace

Related

sed command not working properly on ubuntu

I have one file named `config_3_setConfigPW.ldif? containing the following line:
{pass}
on terminal, I used following commands
SLAPPASSWD=Pwd&0011
sed -i "s#{pass}#$SLAPPASSWD#" config_3_setConfigPW.ldif
It should replace {pass} to Pwd&0011 but it generates Pwd{pass}0011.

The reason is that the SLAPPASSWD shell variable is expanded before sed sees it. So sed sees:
sed -i "s#{pass}#Pwd&0011#" config_3_setConfigPW.ldif
When an "&" is on the right hand side of a pattern it means "copy the matched input", and in your case the matched input is "{pass}".
The real problem is that you would have to escape all the special characters that might arise in SLAPPASSWD, to prevent sed doing this. For example, if you had character "#" in the password, sed would think it was the end of the substitute command, and give a syntax error.
Because of this, I wouldn't use sed for this. You could try gawk or perl?
eg, this will print out the modified file in awk (though it still assumes that SLAPPASSWD contains no " character
awk -F \{pass\} ' { print $1"'${SLAPPASSWD}'"$2 } ' config_3_setConfigPW.ldif

That's because$SLAPPASSWD contains the character sequences & which is a metacharacter used by sed and evaluates to the matched text in the s command. Meaning:
sed 's/{pass}/match: &/' <<< '{pass}'
would give you:
match: {pass}
A time ago I've asked this question: "Is it possible to escape regex metacharacters reliably with sed". Answers there show how to reliably escape the password before using it as the replacement part:
pwd="Pwd&0011"
pwdEscaped="$(sed 's/[&/\]/\\&/g' <<< "$pwd")"
# Now you can safely pass $pwd to sed
sed -i "s/{pass}/$pwdEscaped/" config_3_setConfigPW.ldif

Bear in mind that sed NEVER operates on strings. The thing sed searches for is a regexp and the thing it replaces it with is string-like but has some metacharacters you need to be aware of, e.g. & or \<number>, and all of it needs to avoid using the sed delimiters, / typically.
If you want to operate on strings you need to use awk:
awk -v old="{pass}" -v new="$SLAPPASSWD" 's=index($0,old){ $0 = substr($0,1,s-1) new substr($0,s+length(old))} 1' file
Even the above would need tweaked if old or new contained escape characters.

sed: replace all non-alpha numeric characters except ">"

I would like to replace all non- alphanumeric characters in lines that start with ">" but NOT replace the ">".
eg.
>header 44554%782 & -GB
would become
>header44554782GB
Also would like to know more generally, how to specify multiple "protected" non-alpha/num characters, for example, if I wanted to keep ">" and spaces or spaces and underscores.
This gets me halfway there (removes all non-alpha numeric).
sed '/^>/s/[^a-zA-Z0-9]//g'
Any ideas?
update
I did not provide enough information on my datastructure.
An example of the text file I need to process is here:
>gi-565662%% 2s-0[protein]
MPPACTYUSYUUSUSUSUSUUSU
SKKKYTYSSALLATLLAY
>gi|47234377324|+98923[protein]
ATTYTYTFYATYFTTTFARRRLAVVVATPATYTKKKK
>gi|23432|bysg==+4D77
TYTYATCYACTAYCTYATYCTAC
ACTYATCYATCYATCYATC
TPAPPAPPCAPPAPCPAC

You could take your existing code, and re-insert the leading > after the substitution:
#!/usr/bin/sed -f
/^>/{
s/[^a-zA-Z0-9]//g
s/^/>/
}

sed (Stream EDitor) is capable of performing the operation you specified, but a simpler tool might be more appropriate. If your system has sed, it probably has tr, as well. With tr you could do:
$ hdr=$(echo '>header 44554%782 & -GB' | tr -dc '>a-zA-Z0-9');
$ echo $hdr
>header44554782GB
The -c option tells tr to match the complement of the set of characters specified in '>a-zA-Z0-9', while the -d option tells tr to delete the matched characters.

this may be simpler
sed -r '/^[>].*/{s/[^[:alnum:]_> ]//g}
example
echo '>header _44554%782? & -GB'|sed -r '/^[>].*/{s/[^[:alnum:]_> ]//g}'
ouptut
>header _44554782 GB
.
> _ and space character protected

You do like this:
String result = yourString.replaceAll("[\\W&&[^<]]", "");
Edited:
var txt = String(">header 44554%782 & -GB");
var exec = txt.replace(/[^>][\W]/g, "");
alert(exec);//>heade445578-GB

Escaping characters in sed

I am trying to replace the value of a couple of php database array variables with sed, but it is not working as expected
Here is an example:
echo \$DB['TYPE']='MYSQL' | sed "s|^$DB['TYPE']=.*$|$DB['TYPE']='POSTGRESQL'|g"
Im trying to replace $DB['TYPE']='MYSQL' with $DB['TYPE']='POSTGRESQL'
I escaped it this way but does not work, I keep getting $DB[TYPE]=MYSQL
echo \$DB['TYPE']='MYSQL' | sed "s|^\$DB[\'TYPE\']=.*$|\$DB[\'TYPE\']=\'POSTGRESQL\'|g"
Thanks in advance

I'm trying to replace $DB['TYPE']='MYSQL' with $DB['TYPE']='POSTGRESQL'
You can use:
sed "s|\(\$DB\['TYPE'\]=\)'MYSQL'|\1'POSTGRESQL'|g" file

Reduce duplication as much as possible:
$ echo "\$DB['TYPE']='MYSQL'" |
sed "s|^\(\$DB\['TYPE'\]='\)[^']*|\1POSTGRES|"
$DB['TYPE']='POSTGRES'
You use line anchors so the g modifier is useless -- the pattern can match at most once per line.
You need double quotes on the echo line. Without them, you get the shell seeing the single quotes as quote characters, not literal characters:
$ echo \$DB['TYPE']='MYSQL'
$DB[TYPE]=MYSQL

In you example sed doen"t find the exact phrase. Why don't you just exchange the word you want to exchange like
echo \$DB['TYPE']='MYSQL' | sed "s|MYSQL|POSTGRESQL|g"
?

I ended up going with this
echo "\$DB['TYPE']='MYSQL'" | sed "s|^\$DB\['TYPE'\]=.*$|\$DB\['TYPE'\]='POSTGRESQL'|"
It works as expected. Thanks guys.

sed rare-delimiter (other than & | / ?...)

I am using the Unix sed command on a string that can contain all types of characters (&, |, !, /, ?, etc).
Is there a complex delimiter (with two characters?) that can fix the error:
sed: -e expression #1, char 22: unknown option to `s'

The characters in the input file are of no concern - sed parses them fine. There may be an issue, however, if you have most of the common characters in your pattern - or if your pattern may not be known beforehand.
At least on GNU sed, you can use a non-printable character that is highly improbable to exist in your pattern as a delimiter. For example, if your shell is Bash:
$ echo '|||' | sed s$'\001''|'$'\001''/'$'\001''g'
In this example, Bash replaces $'\001' with the character that has the octal value 001 - in ASCII it's the SOH character (start of heading).
Since such characters are control/non-printable characters, it's doubtful that they will exist in the pattern. Unless, that is, you are doing something weird like modifying binary files - or Unicode files without the proper locale settings.

Another way to do this is to use Shell Parameter Substitution.
${parameter/pattern/replace} # substitute replace for pattern once
or
${parameter//pattern/replace} # substitute replace for pattern everywhere
Here is a quite complex example that is difficult with sed:
$ parameter="Common sed delimiters: [sed-del]"
$ pattern="\[sed-del\]"
$ replace="[/_%:\\#]"
$ echo "${parameter//$pattern/replace}"
result is:
Common sed delimiters: [/_%:\#]
However: This only work with bash parameters and not files where sed excel.

There is no such option for multi-character expression delimiters in sed, but I doubt
you need that. The delimiter character should not occur in the pattern, but if it appears in the string being processed, it's not a problem. And unless you're doing something extremely weird, there will always be some character that doesn't appear in your search pattern that can serve as a delimiter.

You need the nested delimiter facility that Perl offers. That allows to use stuff like matching, substituting, and transliterating without worrying about the delimiter being included in your contents. Since perl is a superset of sed, you should be able to use it for whatever you’re used sed for.
Consider this:
$ perl -nle 'print if /something/' inputs
Now if your something contains a slash, you have a problem. The way to fix this is to change delimiter, preferably to a bracketing one. So for example, you could having anything you like in the $WHATEVER shell variable (provided the backets are balanced), which gets interpolated by the shell before Perl is even called here:
$ perl -nle "print if m($WHATEVER)" /usr/share/dict/words
That works even if you have correctly nested parens in $WHATEVER. The four bracketing pairs which correctly nest like this in Perl are < >, ( ), [ ], and { }. They allow arbitrary contents that include the delimiter if that delimiter is balanced.
If it is not balanced, then do not use a delimiter at all. If the pattern is in a Perl variable, you don’t need to use the match operator provided you use the =~ operator, so:
$whatever = "some arbitrary string ( / # [ etc";
if ($line =~ $whatever) { ... }

With the help of Jim Lewis, I finally did a test before using sed :
if [ `echo $1 | grep '|'` ]; then
grep ".*$1.*:" $DB_FILE | sed "s#^.*$1*.*\(:\)## "
else
grep ".*$1.*:" $DB_FILE | sed "s|^.*$1*.*\(:\)|| "
fi
Thanks for help

Wow. I totally did not know that you could use any character as a delimiter.
At least half the time I use the sed and BREs its on paths, code snippets, junk characters, things like that. I end up with a bunch of horribly unreadable escapes which I'm not even sure won't die on some combination I didn't think of. But if you can exclude just some character class (or just one character even)
echo '#01Y $#1+!' | sed -e 'sa$#1+ashita' -e 'su#01YuHolyug'
> > > Holy shit!
That's so much easier.

Escaping the delimiter inline for BASH to parse is cumbersome and difficult to read (although the delimiter does need escaping for sed's benefit when it's first used, per-expression).
To pull together thkala's answer and user4401178's comment:
DELIM=$(echo -en "\001");
sed -n "\\${DELIM}${STARTING_SEARCH_TERM}${DELIM},\\${DELIM}${ENDING_SEARCH_TERM}${DELIM}p" "${FILE}"
This example returns all results starting from ${STARTING_SEARCH_TERM} until ${ENDING_SEARCH_TERM} that don't match the SOH (start of heading) character with ASCII code 001.

There's no universal separator, but it can be escaped by a backslash for sed to not treat it like separator (at least unless you choose a backslash character as separator).
Depending on the actual application, it might be handy to just escape those characters in both pattern and replacement.
If you're in a bash environment, you can use bash substitution to escape sed separator, like this:
safe_replace () {
sed "s/${1//\//\\\/}/${2//\//\\\/}/g"
}
It's pretty self-explanatory, except for the bizarre part.
Explanation to that:
${1//\//\\\/}
${ - bash expansion starts
1 - first positional argument - the pattern
// - bash pattern substitution pattern separator "replace-all" variant
\/ - literal slash
/ - bash pattern substitution replacement separator
\\ - literal backslash
\/ - literal slash
} - bash expansion ends
example use:
$ input="ka/pus/ta"
$ pattern="/pus/"
$ replacement="/re/"
$ safe_replace "$pattern" "$replacement" <<< "$input"
ka/re/ta

sed: Replace part of a line

How can one replace a part of a line with sed?
The line
DBSERVERNAME xxx
should be replaced to:
DBSERVERNAME yyy
The value xxx can vary and there are two tabs between dbservername and the value. This name-value pair is one of many from a configuration file.
I tried with the following backreference:
echo "DBSERVERNAME xxx" | sed -rne 's/\(dbservername\)[[:blank:]]+\([[:alpha:]]+\)/\1 yyy/gip'
and that resulted in an error: invalid reference \1 on `s' command's RHS.
Whats wrong with the expression? Using GNU sed.

This works:
sed -rne 's/(dbservername)\s+\w+/\1 yyy/gip'
(When you use the -r option, you don't have to escape the parens.)
Bit of explanation:
-r is extended regular expressions - makes a difference to how the regex is written.
-n does not print unless specified - sed prints by default otherwise,
-e means what follows it is an expression. Let's break the expression down:
s/// is the command for search-replace, and what's between the first pair is the regex to match, and the second pair the replacement,
gip, which follows the search replace command; g means global, i.e., every match instead of just the first will be replaced in a line; i is case-insensitivity; p means print when done (remember the -n flag from earlier!),
The brackets represent a match part, which will come up later. So dbservername is the first match part,
\s is whitespace, + means one or more (vs *, zero or more) occurrences,
\w is a word, that is any letter, digit or underscore,
\1 is a special expression for GNU sed that prints the first bracketed match in the accompanying search.

Others have already mentioned the escaping of parentheses, but why do you need a back reference at all, if the first part of the line is constant?
You could simply do
sed -e 's/dbservername.*$/dbservername yyy/g'

You're escaping your ( and ). I'm pretty sure you don't need to do that. Try:
sed -rne 's/(dbservername)[[:blank:]]+\([[:alpha:]]+\)/\1 yyy/gip'

You shouldn't be escaping things when you use single quotes. ie.
echo "DBSERVERNAME xxx" | sed -rne 's/(dbservername[[:blank:]]+)([[:alpha:]]+)/\1 yyy/gip'

You shouldn't be escaping your parens. Try:
echo "DBSERVERNAME xxx" | sed -rne 's/(dbservername)[[:blank:]]+([[:alpha:]]+)/\1 yyy/gip'

This might work for you:
echo "DBSERVERNAME xxx" | sed 's/\S*$/yyy/'
DBSERVERNAME yyy

Try this
sed -re 's/DBSERVERNAME[ \t]*([^\S]+)/\yyy/ig' temp.txt
or this
awk '{if($1=="DBSERVERNAME") $2 ="YYY"} {print $0;}' temp.txt

Categories

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

Sed remove multiple characters - sed

I would like to replace multiple characters echo "R \e&p[%20])l(a/ce" | sed 's|%20|-|g;s|\[||g;s|]||g;s| ||g;s|#||g;s|/||g;s|)||g;s|(||g;s|&||g;s|\\||g' Rep-lace Is there another way of doing so or is this it? Replace %20 with - and the rest with nothing

In Bash you can use in Parameter Expansion + sed: bash$ STR="R \e&p[%20])l(a/ce" bash$ echo "${STR/"%20"/-}" | sed -r 's/[^a-z-]//gi' Rep-lace

Related

sed command not working properly on ubuntu

sed: replace all non-alpha numeric characters except ">"

Escaping characters in sed

sed rare-delimiter (other than & | / ?...)

sed: Replace part of a line

Categories

Resources