using sed to remove special chars and add spaces instead

using sed to remove special chars and add spaces instead - sed

I have a block of text i'd like to change up:
^#^A^#jfits^#^A^#pin^#^A^#sadface^#^A^#secret^#^A^#test^#^A^#tools^#^A^#ttttfft^#^A^#tty^#^A^#vuln^#^A^#yes^#^
using sed i'd like to remove all the ^#^A^ (and variations of those chars) with a few spaces.
I tried:
cat -A file | sed 's/\^A\^\#/ /'
but thats obviously wrong, can someone help?

if you can enumerate the allowed characters then you can do something like
sed -e 's/[^a-zA-Z0-9]/ /g'
which will replace everything not in the set of alphanumeric characters with a space.
If you just want to replace all 'non-printable' characters with spaces then you can use a character class[1] with
sed -e 's/[^[:print:]]/ /g'
some older versions of sed may not support this syntax though but it is standardized in the unix specification so you should not feel guilty for using it.[2]
[1] http://sed.sourceforge.net/sedfaq3.html
[2] http://pubs.opengroup.org/onlinepubs/009695399/basedefs/xbd_chap09.html#tag_09_03

It looks like ^A is not two characters, but in fact just one control character. So you should write something like \x01 instead.
Anyway, there are three character ranges, \x00-\x1f are control characters, \x20-\x7f are ascii, and others are... something that depends on encoding.
I don't know sed well, but if you want ascii only, that's how I would've done it in perl:
head /dev/urandom | perl -pe 's/[^\x20-\x7f]/ /gi'

If only replace ^A and ^#, you can use this:
sed 's/[\x01\x0]/ /g' file
Then I find more similar answers in SO which already discussed.
https://superuser.com/questions/75130/how-to-remove-this-symbol-with-vim
Replacing Control Character in sed

Related

Matching strings even if they start with white spaces in SED

I'm having issues matching strings even if they start with any number of white spaces. It's been very little time since I started using regular expressions, so I need some help
Here is an example. I have a file (file.txt) that contains two lines
#String1='Test One'
String1='Test Two'
Im trying to change the value for the second line, without affecting line 1 so I used this
sed -i "s|String1=.*$|String1='Test Three'|g"
This changes the values for both lines. How can I make sed change only the value of the second string?
Thank you

With gnu sed, you match spaces using \s, while other sed implementations usually work with the [[:space:]] character class. So, pick one of these:
sed 's/^\s*AWord/AnotherWord/'
sed 's/^[[:space:]]*AWord/AnotherWord/'
Since you're using -i, I assume GNU sed. Either way, you probably shouldn't retype your word, as that introduces the chance of a typo. I'd go with:
sed -i "s/^\(\s*String1=\).*/\1'New Value'/" file
Move the \s* outside of the parens if you don't want to preserve the leading whitespace.

There are a couple of solutions you could use to go about your problem
If you want to ignore lines that begin with a comment character such as '#' you could use something like this:
sed -i "/^\s*#/! s|String1=.*$|String1='Test Three'|g" file.txt
which will only operate on lines that do not match the regular expression /.../! that begins ^ with optional whiltespace\s* followed by an octothorp #
The other option is to include the characters before 'String' as part of the substitution. Doing it this way means you'll need to capture \(...\) the group to include it in the output with \1
sed -i "s|^\(\s*\)String1=.*$|\1String1='Test Four'|g" file.txt

With GNU sed, try:
sed -i "s|^\s*String1=.*$|String1='Test Three'|" file
or
sed -i "/^\s*String1=/s/=.*/='Test Three'/" file

Using awk you could do:
awk '/String1/ && f++ {$2="Test Three"}1' FS=\' OFS=\' file
#String1='Test One'
String1='Test Three'
It will ignore first hits of string1 since f is not true.

sed and perl not replacing a letter in a file

I have a file 1.htm. I want to replace a letter ṣ (s with dot below). I tried with both sed and perl and it does not replace.
sed -i 's/ṣ/s/g' "1.htm"
perl -i -pe 's/ṣ/s/g' "1.htm"
can anyone suggest what to do
1.html (not replacing ṣ)
Also i have found another strange thing. Sed (same command as above) replaces in one file but not the other I am putting the links
replacable.html
unreplacable.html same as 1.html
Why is it happening so. sed is able to replace ṣ in one file but not the other.

You have combined characters in the html file. That is, the "ṣ" is really a "s" followed by a " ̣" (a COMBINING DOT BELOW). One possibility to fix the oneliner is:
perl -C -i -pe 's/s\x{0323}/s/g' "1.htm"
That is, turn utf8 mode for stdout/stdin on (-C) and explicitely write the two characters in the left side of the s///.
Another possibility is to normalize all the combining characters using Unicode::Normalize, e.g.:
perl -C -MUnicode::Normalize -Mutf8 -i -pe '$_=NFKC($_); s/ṣ/s/g' "1.htm"
But this would also normalize all the other characters in the input file, which may or may not be OK for you.

This might work for you (GNU sed):
sed 's/\o341\o271\o243/s/g' file
To find seds octal interpretation of a character use:
echo 'ṣ'| sed l
This returns (for me):
\341\271\243$
ṣ
Then use \onnn (or combinations of) to find the correct pattern in the lefthandside (LFH) of the substitute command.
N.B. \onnn may also be used in the RHS of the substitute command.

Sed - Capitalize First character of a string ended in a dot

I need to capitalize (no matter what size of the string (4, 3, 2 or 1)) strings ended in a dot. But, the strings that do not end in a dot bust be kept in lowercase.
These 3 commands were capable of doing what i needed, but in 1-char strings ended in dot, and 2-char strings ended in dot.
sed -i -e "/<b>/ {s/\.\([^ ]\)/. \1/g}" file
sed -i -e "/<b>/ {s/\( [a-z]\.\)/\U\1/g}" file
sed -i -e "/<b>/ {s/\([a-z][^ ]\.\)/\u&/g}" file
following my stream, i tought that doing this (below), would make pretty much sense, but it did \not\ worked, and made the 3-char strings look like this: YEs. and the 4-char strings like HAHa.
sed -i -e "/<b>/ {s/\([a-z][^ ][^ ]\.\)/\u&/g}" file
Can someone help? :p (if possible, point me what have i done wrong)

This seems to do what you want:
sed '/<b>/{s/[a-z]*\./\u&/}' input
In your case, [a-z][^ ][^ ]\. only matches 3 letters or more. Instead of forcing the existence of [^ ]'s you can use * to make them optional: [a-z][^ ]*

sed rare-delimiter (other than & | / ?...)

I am using the Unix sed command on a string that can contain all types of characters (&, |, !, /, ?, etc).
Is there a complex delimiter (with two characters?) that can fix the error:
sed: -e expression #1, char 22: unknown option to `s'

The characters in the input file are of no concern - sed parses them fine. There may be an issue, however, if you have most of the common characters in your pattern - or if your pattern may not be known beforehand.
At least on GNU sed, you can use a non-printable character that is highly improbable to exist in your pattern as a delimiter. For example, if your shell is Bash:
$ echo '|||' | sed s$'\001''|'$'\001''/'$'\001''g'
In this example, Bash replaces $'\001' with the character that has the octal value 001 - in ASCII it's the SOH character (start of heading).
Since such characters are control/non-printable characters, it's doubtful that they will exist in the pattern. Unless, that is, you are doing something weird like modifying binary files - or Unicode files without the proper locale settings.

Another way to do this is to use Shell Parameter Substitution.
${parameter/pattern/replace} # substitute replace for pattern once
or
${parameter//pattern/replace} # substitute replace for pattern everywhere
Here is a quite complex example that is difficult with sed:
$ parameter="Common sed delimiters: [sed-del]"
$ pattern="\[sed-del\]"
$ replace="[/_%:\\#]"
$ echo "${parameter//$pattern/replace}"
result is:
Common sed delimiters: [/_%:\#]
However: This only work with bash parameters and not files where sed excel.

There is no such option for multi-character expression delimiters in sed, but I doubt
you need that. The delimiter character should not occur in the pattern, but if it appears in the string being processed, it's not a problem. And unless you're doing something extremely weird, there will always be some character that doesn't appear in your search pattern that can serve as a delimiter.

You need the nested delimiter facility that Perl offers. That allows to use stuff like matching, substituting, and transliterating without worrying about the delimiter being included in your contents. Since perl is a superset of sed, you should be able to use it for whatever you’re used sed for.
Consider this:
$ perl -nle 'print if /something/' inputs
Now if your something contains a slash, you have a problem. The way to fix this is to change delimiter, preferably to a bracketing one. So for example, you could having anything you like in the $WHATEVER shell variable (provided the backets are balanced), which gets interpolated by the shell before Perl is even called here:
$ perl -nle "print if m($WHATEVER)" /usr/share/dict/words
That works even if you have correctly nested parens in $WHATEVER. The four bracketing pairs which correctly nest like this in Perl are < >, ( ), [ ], and { }. They allow arbitrary contents that include the delimiter if that delimiter is balanced.
If it is not balanced, then do not use a delimiter at all. If the pattern is in a Perl variable, you don’t need to use the match operator provided you use the =~ operator, so:
$whatever = "some arbitrary string ( / # [ etc";
if ($line =~ $whatever) { ... }

With the help of Jim Lewis, I finally did a test before using sed :
if [ `echo $1 | grep '|'` ]; then
grep ".*$1.*:" $DB_FILE | sed "s#^.*$1*.*\(:\)## "
else
grep ".*$1.*:" $DB_FILE | sed "s|^.*$1*.*\(:\)|| "
fi
Thanks for help

Wow. I totally did not know that you could use any character as a delimiter.
At least half the time I use the sed and BREs its on paths, code snippets, junk characters, things like that. I end up with a bunch of horribly unreadable escapes which I'm not even sure won't die on some combination I didn't think of. But if you can exclude just some character class (or just one character even)
echo '#01Y $#1+!' | sed -e 'sa$#1+ashita' -e 'su#01YuHolyug'
> > > Holy shit!
That's so much easier.

Escaping the delimiter inline for BASH to parse is cumbersome and difficult to read (although the delimiter does need escaping for sed's benefit when it's first used, per-expression).
To pull together thkala's answer and user4401178's comment:
DELIM=$(echo -en "\001");
sed -n "\\${DELIM}${STARTING_SEARCH_TERM}${DELIM},\\${DELIM}${ENDING_SEARCH_TERM}${DELIM}p" "${FILE}"
This example returns all results starting from ${STARTING_SEARCH_TERM} until ${ENDING_SEARCH_TERM} that don't match the SOH (start of heading) character with ASCII code 001.

There's no universal separator, but it can be escaped by a backslash for sed to not treat it like separator (at least unless you choose a backslash character as separator).
Depending on the actual application, it might be handy to just escape those characters in both pattern and replacement.
If you're in a bash environment, you can use bash substitution to escape sed separator, like this:
safe_replace () {
sed "s/${1//\//\\\/}/${2//\//\\\/}/g"
}
It's pretty self-explanatory, except for the bizarre part.
Explanation to that:
${1//\//\\\/}
${ - bash expansion starts
1 - first positional argument - the pattern
// - bash pattern substitution pattern separator "replace-all" variant
\/ - literal slash
/ - bash pattern substitution replacement separator
\\ - literal backslash
\/ - literal slash
} - bash expansion ends
example use:
$ input="ka/pus/ta"
$ pattern="/pus/"
$ replacement="/re/"
$ safe_replace "$pattern" "$replacement" <<< "$input"
ka/re/ta

How can I remove all non-word characters except the newline?

I have a file like this:
my line - some words & text
oh lóok i've got some characters
I want to 'normalize' it and remove all the non-word characters. I want to end up with something like this:
mylinesomewordstext
ohlóokivegotsomecharacters
I'm using Linux on the command line at the moment, and I'm hoping there's some one-liner I can use.
I tried this:
cat file | perl -pe 's/\W//'
But that removed all the newlines and put everything one line. Is there someway I can tell Perl to not include newlines in the \W? Or is there some other way?

This removes characters that don't match \w or \n:
cat file | perl -C -pe 's/[^\w\n]//g'

#sth's solution uses Perl, which is (at least on my system) not Unicode compatible, thus it loses the accented o character.
On the other hand, sed is Unicode compatible (according to the lists on this page), and gives a correct result:
$ sed 's/\W//g' a.txt
mylinesomewordstext
ohlóokivegotsomecharacters

In Perl, I'd just add the -l switch, which re-adds the newline by appending it to the end of every print():
perl -ple 's/\W//g' file
Notice that you don't need the cat.

The previous response isn't echoing the "ó" character. At least in my case.
sed 's/\W//g' file

Best practices for shell scripting dictate that you should use the tr program for replacing single characters instead of sed, because it's faster and more efficient. Obviously use sed if replacing longer strings.
tr -d '[:blank:][:punct:]' < file
When run with time I get:
real 0m0.003s
user 0m0.000s
sys 0m0.004s
When I run the sed answer (sed -e 's/\W//g' file) with time I get:
real 0m0.003s
user 0m0.004s
sys 0m0.004s
While not a "huge" difference, you'll notice the difference when running against larger data sets. Also please notice how I didn't pipe cat's output into tr, instead using I/O redirection (one less process to spawn).

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

using sed to remove special chars and add spaces instead - sed

If only replace ^A and ^#, you can use this: sed 's/[\x01\x0]/ /g' file Then I find more similar answers in SO which already discussed. https://superuser.com/questions/75130/how-to-remove-this-symbol-with-vim Replacing Control Character in sed

Related

Matching strings even if they start with white spaces in SED

sed and perl not replacing a letter in a file

Sed - Capitalize First character of a string ended in a dot

sed rare-delimiter (other than & | / ?...)

How can I remove all non-word characters except the newline?

Categories

Resources