Sed - Capitalize First character of a string ended in a dot - sed

I need to capitalize (no matter what size of the string (4, 3, 2 or 1)) strings ended in a dot. But, the strings that do not end in a dot bust be kept in lowercase.
These 3 commands were capable of doing what i needed, but in 1-char strings ended in dot, and 2-char strings ended in dot.
sed -i -e "/<b>/ {s/\.\([^ ]\)/. \1/g}" file
sed -i -e "/<b>/ {s/\( [a-z]\.\)/\U\1/g}" file
sed -i -e "/<b>/ {s/\([a-z][^ ]\.\)/\u&/g}" file
following my stream, i tought that doing this (below), would make pretty much sense, but it did \not\ worked, and made the 3-char strings look like this: YEs. and the 4-char strings like HAHa.
sed -i -e "/<b>/ {s/\([a-z][^ ][^ ]\.\)/\u&/g}" file
Can someone help? :p (if possible, point me what have i done wrong)

This seems to do what you want:
sed '/<b>/{s/[a-z]*\./\u&/}' input
In your case, [a-z][^ ][^ ]\. only matches 3 letters or more. Instead of forcing the existence of [^ ]'s you can use * to make them optional: [a-z][^ ]*

Related

sed command not working properly on ubuntu

I have one file named `config_3_setConfigPW.ldif? containing the following line:
{pass}
on terminal, I used following commands
SLAPPASSWD=Pwd&0011
sed -i "s#{pass}#$SLAPPASSWD#" config_3_setConfigPW.ldif
It should replace {pass} to Pwd&0011 but it generates Pwd{pass}0011.
The reason is that the SLAPPASSWD shell variable is expanded before sed sees it. So sed sees:
sed -i "s#{pass}#Pwd&0011#" config_3_setConfigPW.ldif
When an "&" is on the right hand side of a pattern it means "copy the matched input", and in your case the matched input is "{pass}".
The real problem is that you would have to escape all the special characters that might arise in SLAPPASSWD, to prevent sed doing this. For example, if you had character "#" in the password, sed would think it was the end of the substitute command, and give a syntax error.
Because of this, I wouldn't use sed for this. You could try gawk or perl?
eg, this will print out the modified file in awk (though it still assumes that SLAPPASSWD contains no " character
awk -F \{pass\} ' { print $1"'${SLAPPASSWD}'"$2 } ' config_3_setConfigPW.ldif
That's because$SLAPPASSWD contains the character sequences & which is a metacharacter used by sed and evaluates to the matched text in the s command. Meaning:
sed 's/{pass}/match: &/' <<< '{pass}'
would give you:
match: {pass}
A time ago I've asked this question: "Is it possible to escape regex metacharacters reliably with sed". Answers there show how to reliably escape the password before using it as the replacement part:
pwd="Pwd&0011"
pwdEscaped="$(sed 's/[&/\]/\\&/g' <<< "$pwd")"
# Now you can safely pass $pwd to sed
sed -i "s/{pass}/$pwdEscaped/" config_3_setConfigPW.ldif
Bear in mind that sed NEVER operates on strings. The thing sed searches for is a regexp and the thing it replaces it with is string-like but has some metacharacters you need to be aware of, e.g. & or \<number>, and all of it needs to avoid using the sed delimiters, / typically.
If you want to operate on strings you need to use awk:
awk -v old="{pass}" -v new="$SLAPPASSWD" 's=index($0,old){ $0 = substr($0,1,s-1) new substr($0,s+length(old))} 1' file
Even the above would need tweaked if old or new contained escape characters.

Matching strings even if they start with white spaces in SED

I'm having issues matching strings even if they start with any number of white spaces. It's been very little time since I started using regular expressions, so I need some help
Here is an example. I have a file (file.txt) that contains two lines
#String1='Test One'
String1='Test Two'
Im trying to change the value for the second line, without affecting line 1 so I used this
sed -i "s|String1=.*$|String1='Test Three'|g"
This changes the values for both lines. How can I make sed change only the value of the second string?
Thank you
With gnu sed, you match spaces using \s, while other sed implementations usually work with the [[:space:]] character class. So, pick one of these:
sed 's/^\s*AWord/AnotherWord/'
sed 's/^[[:space:]]*AWord/AnotherWord/'
Since you're using -i, I assume GNU sed. Either way, you probably shouldn't retype your word, as that introduces the chance of a typo. I'd go with:
sed -i "s/^\(\s*String1=\).*/\1'New Value'/" file
Move the \s* outside of the parens if you don't want to preserve the leading whitespace.
There are a couple of solutions you could use to go about your problem
If you want to ignore lines that begin with a comment character such as '#' you could use something like this:
sed -i "/^\s*#/! s|String1=.*$|String1='Test Three'|g" file.txt
which will only operate on lines that do not match the regular expression /.../! that begins ^ with optional whiltespace\s* followed by an octothorp #
The other option is to include the characters before 'String' as part of the substitution. Doing it this way means you'll need to capture \(...\) the group to include it in the output with \1
sed -i "s|^\(\s*\)String1=.*$|\1String1='Test Four'|g" file.txt
With GNU sed, try:
sed -i "s|^\s*String1=.*$|String1='Test Three'|" file
or
sed -i "/^\s*String1=/s/=.*/='Test Three'/" file
Using awk you could do:
awk '/String1/ && f++ {$2="Test Three"}1' FS=\' OFS=\' file
#String1='Test One'
String1='Test Three'
It will ignore first hits of string1 since f is not true.

using sed to remove special chars and add spaces instead

I have a block of text i'd like to change up:
^#^A^#jfits^#^A^#pin^#^A^#sadface^#^A^#secret^#^A^#test^#^A^#tools^#^A^#ttttfft^#^A^#tty^#^A^#vuln^#^A^#yes^#^
using sed i'd like to remove all the ^#^A^ (and variations of those chars) with a few spaces.
I tried:
cat -A file | sed 's/\^A\^\#/ /'
but thats obviously wrong, can someone help?
if you can enumerate the allowed characters then you can do something like
sed -e 's/[^a-zA-Z0-9]/ /g'
which will replace everything not in the set of alphanumeric characters with a space.
If you just want to replace all 'non-printable' characters with spaces then you can use a character class[1] with
sed -e 's/[^[:print:]]/ /g'
some older versions of sed may not support this syntax though but it is standardized in the unix specification so you should not feel guilty for using it.[2]
[1] http://sed.sourceforge.net/sedfaq3.html
[2] http://pubs.opengroup.org/onlinepubs/009695399/basedefs/xbd_chap09.html#tag_09_03
It looks like ^A is not two characters, but in fact just one control character. So you should write something like \x01 instead.
Anyway, there are three character ranges, \x00-\x1f are control characters, \x20-\x7f are ascii, and others are... something that depends on encoding.
I don't know sed well, but if you want ascii only, that's how I would've done it in perl:
head /dev/urandom | perl -pe 's/[^\x20-\x7f]/ /gi'
If only replace ^A and ^#, you can use this:
sed 's/[\x01\x0]/ /g' file
Then I find more similar answers in SO which already discussed.
https://superuser.com/questions/75130/how-to-remove-this-symbol-with-vim
Replacing Control Character in sed

sed and perl not replacing a letter in a file

I have a file 1.htm. I want to replace a letter ṣ (s with dot below). I tried with both sed and perl and it does not replace.
sed -i 's/ṣ/s/g' "1.htm"
perl -i -pe 's/ṣ/s/g' "1.htm"
can anyone suggest what to do
1.html (not replacing ṣ)
Also i have found another strange thing. Sed (same command as above) replaces in one file but not the other I am putting the links
replacable.html
unreplacable.html same as 1.html
Why is it happening so. sed is able to replace ṣ in one file but not the other.
You have combined characters in the html file. That is, the "ṣ" is really a "s" followed by a " ̣" (a COMBINING DOT BELOW). One possibility to fix the oneliner is:
perl -C -i -pe 's/s\x{0323}/s/g' "1.htm"
That is, turn utf8 mode for stdout/stdin on (-C) and explicitely write the two characters in the left side of the s///.
Another possibility is to normalize all the combining characters using Unicode::Normalize, e.g.:
perl -C -MUnicode::Normalize -Mutf8 -i -pe '$_=NFKC($_); s/ṣ/s/g' "1.htm"
But this would also normalize all the other characters in the input file, which may or may not be OK for you.
This might work for you (GNU sed):
sed 's/\o341\o271\o243/s/g' file
To find seds octal interpretation of a character use:
echo 'ṣ'| sed l
This returns (for me):
\341\271\243$
ṣ
Then use \onnn (or combinations of) to find the correct pattern in the lefthandside (LFH) of the substitute command.
N.B. \onnn may also be used in the RHS of the substitute command.

Remove a hyphen from a specific line in a file

I have a data file that needs to have several uniq identifiers stripped of hyphens.
So I have:
(Special_Section "data-values")
and I want to have it replaced with:
(Special_Section "datavalues")
I wanted to use a simple sed find/replace, but the data and values are different each time. Preferably, I'd run this in-place since the file has a lot of other information I want to keep in tact.
Does sed or awk have a way to remove the hyphen from the matched portion only?
Currently I can match with: sed -i 's/Special_Section "[a-zA-Z0-9]*-[a-zA-Z0-9]*"/&/g *myfiles*
But I would like to then run s/-// on & if it's possible.
You seems to be using GNU sed, so something like this might work:
sed -ri '
s/(Special_Section [^-]*)-([^)]*)/\1\2/g
' <your_filename_glob>
does this work?
sed -i '/(Special_Section ".*-.*")/{s/-//}' yourFile
Close - scan for the lines and then substitute on those that match:
sed -i '/Special_Section "[a-zA-Z0-9]*-[a-zA-Z0-9]*"/s/\( "[a-zA-Z0-9]*\)-\([a-zA-Z0-9]*\)"/\1\2/' *myfiles*
You can split that over several lines to avoid the scroll bar in SO:
sed -i '/Special_Section "[a-zA-Z0-9]*-[a-zA-Z0-9]*"/{
s/\( "[a-zA-Z0-9]*\)-\([a-zA-Z0-9]*\)"/\1\2/
}' *myfiles*
And on further thoughts, you can also do:
sed -i 's/\(Special_Section "[a-zA-Z0-9]*\)-\([a-zA-Z0-9]*"\)/\1\2/' *myfiles*
This is more compact. You can add the g qualifier if you need it. Both solutions use the special \(...\) notation to capture parts of the regular expression.