sed Removing whitespace around certain character - sed

what would be the best way to remove whitespace only around certain character. Let's say a dash - Some- String- 12345- Here would become Some-String-12345-Here. Something like sed 's/\ -/-/g;s/-\ /-/g' but I am sure there must be a better way.
Thanks!

If you mean all whitespace, not just spaces, then you could try \s:
echo 'Some- String- 12345- Here' | sed 's/\s*-\s*/-/g'
Output:
Some-String-12345-Here
Or use the [:space:] character class:
echo 'Some- String- 12345- Here' | sed 's/[[:space:]]*-[[:space:]]*/-/g'
Different versions of sed may or not support these, but GNU sed does.

Try:
's/ *- */-/g'

you can use awk as well
$ echo 'Some - String- 12345-' | awk -F" *- *" '{$1=$1}1' OFS="-"
Some-String-12345-
if its just "- " in your example
$ s="Some- String- 12345-"
$ echo ${s//- /-}
Some-String-12345-

Related

How to remove after second period in a string using sed

In my script, have a possible version number: 15.03.2 set to variable $STRING. These numbers always change. I want to strip it down to: 15.03 (or whatever it will be next time).
How do I remove everything after the second . using sed?
Something like:
$(echo "$STRING" | sed "s/\.^$\.//")
(I don't know what ^, $ and others do, but they look related, so I just guessed.)
I think the better tool here is cut
echo '15.03.2' | cut -d . -f -2
This might work for you (GNU sed):
sed 's/\.[^.]*//2g' file
Remove the second or more occurrence of a period followed by zero or non-period character(s).
$ echo '15.03.2' | sed 's/\([^.]*\.[^.]*\)\..*/\1/'
15.03
More generally to skip N periods:
$ echo '15.03.2.3.4.5' | sed -E 's/(([^.]*\.){2}[^.]*)\..*/\1/'
15.03.2
$ echo '15.03.2.3.4.5' | sed -E 's/(([^.]*\.){3}[^.]*)\..*/\1/'
15.03.2.3
$ echo '15.03.2.3.4.5' | sed -E 's/(([^.]*\.){4}[^.]*)\..*/\1/'
15.03.2.3.4

Uppercase to Lowercase with Sed and character classes

I'd like to convert a string from upper to lower case. I know there are different ways of solving this problem, but I'd like to understand why this command doesn't work:
echo "aa" | sed 's/'[:upper:]'/'[:lower:]'/g'
Is it a wrong way to use the classes of characters?
from lowercase to uppercase, you can use
echo "aW123bR" | sed -r 's/[a-z]+/\U&/g'
tr command is an interesting alternative
echo "aW123bR" | tr '[:lower:]' '[:upper:]'
In sed, the y command is used for mapping sets of characters:
sed 'y/ABCDEFGHIJKLMNOPQRSTUVWXYZ/abcdefghijklmnopqrstuvwxyz/'
It requires a literal list of characters, not character classes.
Another possible solution with gawk :
[ ~]$ echo "HELLO"|awk '{print tolower($0)}'
hello

How do I get rid of this unicode character?

Any idea how to get rid of this irritating character U+0092 from a bunch of text files? I've tried all the below but it doesn't work. It's called U+0092+control from the character map
sed -i 's/\xc2\x92//' *
sed -i 's/\u0092//' *
sed -i 's///' *
Ah, I've found a way:
CHARS=$(python2 -c 'print u"\u0092".encode("utf8")')
sed 's/['"$CHARS"']//g'
But is there a direct sed method for this?
Try sed "s/\`//g" *. (I added the g so it will remove all the backticks it finds).
EDIT: It's not a backtick that OP wants to remove.
Following the solution in this question, this ought to work:
sed 's/\xc2\x92//g'
To demonstrate it does:
$ CHARS=$(python -c 'print u"asdf\u0092asdf".encode("utf8")')
$ echo $CHARS
asdf<funny glyph symbol>asdf
$ echo $CHARS | sed 's/\xc2\x92//g'
asdfasdf
Seeing as it's something you tried already, perhaps what is in your text file is not U+0092?
This might work for you (GNU sed):
echo "string containing funny character(s)" | sed -n 'l0'
This will display the string as sed sees it in octal, then use:
echo "string containing funny character(s)" | sed 's/\onnn//g'
Where nnn is the octal value, to delete it/them.

Trim text using sed

How do I remove the first and the last quotes?
echo "\"test\"" | sed 's/"//' | sed 's/"$//'
The above is working as expected, But I guess there must be a better way.
You can combine the sed calls into one:
echo "\"test\"" | sed 's/"//;s/"$//'
The command you posted will remove the first quote even if it's not at the beginning of the line. If you want to make sure that it's only done if it is at the beginning, then you can anchor it like this:
echo "\"test\"" | sed 's/^"//;s/"$//'
Some versions of sed don't like multiple commands separated by semicolons. For them you can do this (it also works in the ones that accept semicolons):
echo "\"test\"" | sed -e 's/^"//' -e 's/"$//'
Maybe you prefer something like this:
echo '"test"' | sed 's/^"\(.*\)"$/\1/'
if you are sure there are no other quotes besides the first and last, just use /g modifier
$ echo "\"test\"" | sed 's/"//g'
test
If you have Ruby(1.9+)
$ echo $s
blah"te"st"test
$ echo $s | ruby -e 's=gets.split("\"");print "#{s[0]}#{s[1..-2].join("\"")+s[-1]}"'
blahte"sttest
Note the 2nd example the first and last quotes which may not be exactly at the first and last positions.
example with more quotes
$ s='bl"ah"te"st"tes"t'
$ echo $s | ruby -e 's=gets.split("\"");print "#{s[0]}#{s[1..-2].join("\"")+s[-1]}"'
blah"te"st"test

Replacing the last word of a path using sed

I have the following: param="/var/tmp/test"
I need to replace the word test with another word such as new_test
need a smart way to replace the last word after "/" with sed
echo 'param="/var/tmp/test"' | sed 's/\/[^\/]*"/\/REPLACEMENT"/'
param="/var/tmp/REPLACEMENT"
echo '/var/tmp/test' | sed 's/\/[^\/]*$/\/REPLACEMENT/'
/var/tmp/REPLACEMENT
Extracting bits and pieces with sed is a bit messy (as Jim Lewis says, use basename and dirname if you can) but at least you don't need a plethora of backslashes to do it if you are going the sed route since you can use the fact that the delimiter character is selectable (I like to use ! when / is too awkward, but it's arbitrary):
$ echo 'param="/var/tmp/test"' | sed ' s!/[^/"]*"!/new_test"! '
param="/var/tmp/new_test"
We can also extract just the part that was substituted, though this is easier with two substitutions in the sed control script:
$ echo 'param="/var/tmp/test"' | sed ' s!.*/!! ; s/"$// '
test
You don't need sed for this...basename and dirname are a better choice for assembling or disassembling pathnames. All those escape characters give me a headache....
param="/var/tmp/test"
param_repl=`dirname $param`/newtest
It's not clear whether param is part of the string that you need processed or it's the variable that holds the string. Assuming the latter, you can do this using only Bash (you don't say which shell you're using):
shopt -s extglob
param="/var/tmp/test"
param="${param/%\/*([^\/])//new_test}"
If param= is part of the string:
shopt -s extglob
string='param="/var/tmp/test"'
string="${string/%\/*([^\/])\"//new}"
This might work for you:
echo 'param="/var/tmp/test"' | sed -r 's#(/(([^/]*/)*))[^"]*#\1newtest#'
param="/var/tmp/newtest"