sed and perl not replacing a letter in a file - perl

I have a file 1.htm. I want to replace a letter ṣ (s with dot below). I tried with both sed and perl and it does not replace.
sed -i 's/ṣ/s/g' "1.htm"
perl -i -pe 's/ṣ/s/g' "1.htm"
can anyone suggest what to do
1.html (not replacing ṣ)
Also i have found another strange thing. Sed (same command as above) replaces in one file but not the other I am putting the links
replacable.html
unreplacable.html same as 1.html
Why is it happening so. sed is able to replace ṣ in one file but not the other.

You have combined characters in the html file. That is, the "ṣ" is really a "s" followed by a " ̣" (a COMBINING DOT BELOW). One possibility to fix the oneliner is:
perl -C -i -pe 's/s\x{0323}/s/g' "1.htm"
That is, turn utf8 mode for stdout/stdin on (-C) and explicitely write the two characters in the left side of the s///.
Another possibility is to normalize all the combining characters using Unicode::Normalize, e.g.:
perl -C -MUnicode::Normalize -Mutf8 -i -pe '$_=NFKC($_); s/ṣ/s/g' "1.htm"
But this would also normalize all the other characters in the input file, which may or may not be OK for you.

This might work for you (GNU sed):
sed 's/\o341\o271\o243/s/g' file
To find seds octal interpretation of a character use:
echo 'ṣ'| sed l
This returns (for me):
\341\271\243$
ṣ
Then use \onnn (or combinations of) to find the correct pattern in the lefthandside (LFH) of the substitute command.
N.B. \onnn may also be used in the RHS of the substitute command.

Related

How to replace a specific character in bash

I want to replace '_v' with a whitespace and the last dot . into a dash "-". I tried using
sed 's/_v/ /' and tr '_v' ' '
Original Text
src-env-package_v1.0.1.18
output
src-en -package 1.0.1.18
Expected Output
src-env-package 1.0.1-18
This might work for you (GNU sed):
sed -E 's/(.*)_v(.*)\./\1 \2-/' file
Use the greed of the .* regexp to find the last occurrence of _v and likewise . and substitute a space for the former and a - for the latter.
If one of the conditions may occur but not necessarily both, use:
sed -E 's/(.*)_v/\1 /;s/(.*)\./\1-/' file
With your shown samples please try following sed code. Using sed's capability to store matched regex values into temp buffer(called capturing groups) here. Also using -E option here to enable ERE(extended regular expressions) for handling regex in better way.
Here is the Online demo for used regex.
sed -E 's/^(src-env-package)_v([0-9]+\..*)\.([0-9]+)$/\1 \2-\3/' Input_file
OR if its a variable value on which you want to run sed command then use following:
var="src-env-package_v1.0.1.18"
sed -E 's/^(src-env-package)_v([0-9]+\..*)\.([0-9]+)$/\1 \2-\3/' <<<"$var"
src-env-package 1.0.1-18
Bonus solution: Adding a perl one-liner solution here, using capturing groups concept(as explained above) in perl and getting the values as per requirement.
perl -pe 's/^(src-env-package)_v((?:[0-9]+\.){1,}[0-9]+)\.([0-9]+)$/\1 \2-\3/' Input_file

Why is my sed multiline find-and-replace not working as expected?

I have a simple sed command that I am using to replace everything between (and including) //thistest.com-- and --thistest.com with nothing (remove the block all together):
sudo sed -i "s#//thistest\.com--.*--thistest\.com##g" my.file
The contents of my.file are:
//thistest.com--
zone "awebsite.com" {
type master;
file "some.stuff.com.hosts";
};
//--thistest.com
As I am using # as my delimiter for the regex, I don't need to escape the / characters. I am also properly (I think) escaping the . in .com. So I don't see exactly what is failing.
Why isn't the entire block being replaced?
You have two problems:
Sed doesn't do multiline pattern matches—at least, not the way you're expecting it to. However, you can use multiline addresses as an alternative.
Depending on your version of sed, you may need to escape alternate delimiters, especially if you aren't using them solely as part of a substitution expression.
So, the following will work with your posted corpus in both GNU and BSD flavors:
sed '\#^//thistest\.com--#, \#^//--thistest\.com# d' /tmp/corpus
Note that in this version, we tell sed to match all lines between (and including) the two patterns. The opening delimiter of each address pattern is properly escaped. The command has also been changed to d for delete instead of s for substitute, and some whitespace was added for readability.
I've also chosen to anchor the address patterns to the start of each line. You may or may not find that helpful with this specific corpus, but it's generally wise to do so when you can, and doesn't seem to hurt your use case.
# separation by line with 1 s//
sed -n -e 'H;${x;s#^\(.\)\(.*\)\1//thistest.com--.*\1//--thistest.com#\2#;p}' YourFile
# separation by line with address pattern
sed -e '\#//thistest.com--#,\#//--thistest.com# d' YourFile
# separation only by char (could be CR, CR/LF, ";" or "oneline") with s//
sed -n -e '1h;1!H;${x;s#//thistest.com--.*\1//--thistest.com##;p}' YourFile
Note:
assuming there is only 1 section thistest per file (if not, it remove anything between the first opening until the last closing section) for the use of s//
does not suite for huge file (load entire file into memory) with s//
sed using addresses pattern cannot select section on the same line, it search 1st pattern to start, and a following line to stop but very efficient on big file and/or multisection

sed command not working properly on ubuntu

I have one file named `config_3_setConfigPW.ldif? containing the following line:
{pass}
on terminal, I used following commands
SLAPPASSWD=Pwd&0011
sed -i "s#{pass}#$SLAPPASSWD#" config_3_setConfigPW.ldif
It should replace {pass} to Pwd&0011 but it generates Pwd{pass}0011.
The reason is that the SLAPPASSWD shell variable is expanded before sed sees it. So sed sees:
sed -i "s#{pass}#Pwd&0011#" config_3_setConfigPW.ldif
When an "&" is on the right hand side of a pattern it means "copy the matched input", and in your case the matched input is "{pass}".
The real problem is that you would have to escape all the special characters that might arise in SLAPPASSWD, to prevent sed doing this. For example, if you had character "#" in the password, sed would think it was the end of the substitute command, and give a syntax error.
Because of this, I wouldn't use sed for this. You could try gawk or perl?
eg, this will print out the modified file in awk (though it still assumes that SLAPPASSWD contains no " character
awk -F \{pass\} ' { print $1"'${SLAPPASSWD}'"$2 } ' config_3_setConfigPW.ldif
That's because$SLAPPASSWD contains the character sequences & which is a metacharacter used by sed and evaluates to the matched text in the s command. Meaning:
sed 's/{pass}/match: &/' <<< '{pass}'
would give you:
match: {pass}
A time ago I've asked this question: "Is it possible to escape regex metacharacters reliably with sed". Answers there show how to reliably escape the password before using it as the replacement part:
pwd="Pwd&0011"
pwdEscaped="$(sed 's/[&/\]/\\&/g' <<< "$pwd")"
# Now you can safely pass $pwd to sed
sed -i "s/{pass}/$pwdEscaped/" config_3_setConfigPW.ldif
Bear in mind that sed NEVER operates on strings. The thing sed searches for is a regexp and the thing it replaces it with is string-like but has some metacharacters you need to be aware of, e.g. & or \<number>, and all of it needs to avoid using the sed delimiters, / typically.
If you want to operate on strings you need to use awk:
awk -v old="{pass}" -v new="$SLAPPASSWD" 's=index($0,old){ $0 = substr($0,1,s-1) new substr($0,s+length(old))} 1' file
Even the above would need tweaked if old or new contained escape characters.

Matching strings even if they start with white spaces in SED

I'm having issues matching strings even if they start with any number of white spaces. It's been very little time since I started using regular expressions, so I need some help
Here is an example. I have a file (file.txt) that contains two lines
#String1='Test One'
String1='Test Two'
Im trying to change the value for the second line, without affecting line 1 so I used this
sed -i "s|String1=.*$|String1='Test Three'|g"
This changes the values for both lines. How can I make sed change only the value of the second string?
Thank you
With gnu sed, you match spaces using \s, while other sed implementations usually work with the [[:space:]] character class. So, pick one of these:
sed 's/^\s*AWord/AnotherWord/'
sed 's/^[[:space:]]*AWord/AnotherWord/'
Since you're using -i, I assume GNU sed. Either way, you probably shouldn't retype your word, as that introduces the chance of a typo. I'd go with:
sed -i "s/^\(\s*String1=\).*/\1'New Value'/" file
Move the \s* outside of the parens if you don't want to preserve the leading whitespace.
There are a couple of solutions you could use to go about your problem
If you want to ignore lines that begin with a comment character such as '#' you could use something like this:
sed -i "/^\s*#/! s|String1=.*$|String1='Test Three'|g" file.txt
which will only operate on lines that do not match the regular expression /.../! that begins ^ with optional whiltespace\s* followed by an octothorp #
The other option is to include the characters before 'String' as part of the substitution. Doing it this way means you'll need to capture \(...\) the group to include it in the output with \1
sed -i "s|^\(\s*\)String1=.*$|\1String1='Test Four'|g" file.txt
With GNU sed, try:
sed -i "s|^\s*String1=.*$|String1='Test Three'|" file
or
sed -i "/^\s*String1=/s/=.*/='Test Three'/" file
Using awk you could do:
awk '/String1/ && f++ {$2="Test Three"}1' FS=\' OFS=\' file
#String1='Test One'
String1='Test Three'
It will ignore first hits of string1 since f is not true.

How can I remove all non-word characters except the newline?

I have a file like this:
my line - some words & text
oh lóok i've got some characters
I want to 'normalize' it and remove all the non-word characters. I want to end up with something like this:
mylinesomewordstext
ohlóokivegotsomecharacters
I'm using Linux on the command line at the moment, and I'm hoping there's some one-liner I can use.
I tried this:
cat file | perl -pe 's/\W//'
But that removed all the newlines and put everything one line. Is there someway I can tell Perl to not include newlines in the \W? Or is there some other way?
This removes characters that don't match \w or \n:
cat file | perl -C -pe 's/[^\w\n]//g'
#sth's solution uses Perl, which is (at least on my system) not Unicode compatible, thus it loses the accented o character.
On the other hand, sed is Unicode compatible (according to the lists on this page), and gives a correct result:
$ sed 's/\W//g' a.txt
mylinesomewordstext
ohlóokivegotsomecharacters
In Perl, I'd just add the -l switch, which re-adds the newline by appending it to the end of every print():
perl -ple 's/\W//g' file
Notice that you don't need the cat.
The previous response isn't echoing the "ó" character. At least in my case.
sed 's/\W//g' file
Best practices for shell scripting dictate that you should use the tr program for replacing single characters instead of sed, because it's faster and more efficient. Obviously use sed if replacing longer strings.
tr -d '[:blank:][:punct:]' < file
When run with time I get:
real 0m0.003s
user 0m0.000s
sys 0m0.004s
When I run the sed answer (sed -e 's/\W//g' file) with time I get:
real 0m0.003s
user 0m0.004s
sys 0m0.004s
While not a "huge" difference, you'll notice the difference when running against larger data sets. Also please notice how I didn't pipe cat's output into tr, instead using I/O redirection (one less process to spawn).