capitalize names having international letters like éèàö - sed

My sed attempts on RHEL 6.3:
$ export LC_ALL=fr_FR.utf-8
$ sed 's/ \([a-zA-Zé]\)\([^ ]*\) /[\u\1\L\2\E] /g' <<< " hélène NOËL étienne "
hélène NOËL étienne
$ export LC_ALL=C
$ sed 's/ \([a-zA-Zé]\)\([^ ]*\) /[\u\1\L\2\E] /g' <<< " hélène NOËL étienne "
[Hÿlÿne] [Noÿl] [ÿtienne]
$ sed --version
GNU sed version 4.2.1
[...]
Is sed able to output the following?
[Hélène] [Noël] [Étienne]

is this ok for you?
kent$ echo " hélène NOËL étienne "|sed -r 's/(\S)(\S+)/[\U\1\L\2]/g'
[Hélène] [Noël] [Étienne]
my sed version is abit different from yours, but I think the line should run there too:
kent$ sed --version |head -1
sed (GNU sed) 4.2.2
added my locale settings, you may want to know:
kent$ echo $LANG
en_US.utf8
kent$ locale
LANG=en_US.utf8
LC_CTYPE="en_US.utf8"
LC_NUMERIC="en_US.utf8"
LC_TIME="en_US.utf8"
LC_COLLATE="en_US.utf8"
LC_MONETARY="en_US.utf8"
LC_MESSAGES="en_US.utf8"
LC_PAPER="en_US.utf8"
LC_NAME="en_US.utf8"
LC_ADDRESS="en_US.utf8"
LC_TELEPHONE="en_US.utf8"
LC_MEASUREMENT="en_US.utf8"
LC_IDENTIFICATION="en_US.utf8"
LC_ALL=

Kent's answer did not solve my issue but I have not provided him all my constraints. My csv input file is like:
sfou;STéphane Foù - stephane.fou#example.com;;
fbar;frédéric bâr - frederic.bar#example.com;;
hnoel;Hélène NOËL - helene.noel#example.com;;
The sed script shall capitalize the names only:
sfou;Stéphane Foù - stephane.fou#example.com;;
8945;Frédéric Bâr - frederic.bar#example.com;;
hnoel;Hélène Noêl - helene.noel#example.com;;
Based on Kent's help, I successfully passed this script:
LC_ALL=fr_FR sed -r 's/(\w)(\w*) /\U\1\L\2 /g' test.cvs
Other locales do not give the right result:
$ LANG=fr_FR.utf8 LC_ALL= sed -r 's/(\w)(\w*) /[\U\1\L\2] /g' test.cvs
sfou;STé[Phane] Foù - stephane.fou#example.com;;
fbar;frédé[Ric] bâ[R] - frederic.bar#example.com;;
hnoel;Hélè[Ne] NOË[L] - helene.noel#example.com;;
$ LANG=C LC_ALL= sed -r 's/(\w)(\w*) /[\U\1\L\2] /g' test.cvs
sfou;STé[Phane] Foù - stephane.fou#example.com;;
fbar;frédé[Ric] bâ[R] - frederic.bar#example.com;;
hnoel;Hélè[Ne] NOË[L] - helene.noel#example.com;;
$ LANG=en_US.utf8 LC_ALL= sed -r 's/(\w)(\w*) /[\U\1\L\2] /g' test.cvs
sfou;STé[Phane] Foù - stephane.fou#example.com;;
fbar;frédé[Ric] bâ[R] - frederic.bar#example.com;;
hnoel;Hélè[Ne] NOË[L] - helene.noel#example.com;;
Locales en_USand fr_FR (without .utf8) are OK:
$ LANG=en_US LC_ALL= sed -r 's/(\w)(\w*) /[\U\1\L\2] /g' test.cvs
sfou;[Stéphane] [Foù] - stephane.fou#example.com;;
fbar;[Frédéric] [Bâr] - frederic.bar#example.com;;
hnoel;[Hélène] [Noël] - helene.noel#example.com;;
$ LANG=fr_FR LC_ALL= sed -r 's/(\w)(\w*) /[\U\1\L\2] /g' test.cvs
sfou;[Stéphane] [Foù] - stephane.fou#example.com;;
fbar;[Frédéric] [Bâr] - frederic.bar#example.com;;
hnoel;[Hélène] [Noël] - helene.noel#example.com;;
Note: I have discovered \w from CodeGnome's links.

Related

Removing repeated characters with sed command

How to remove repeated characters or symbols in a string
some text\n\n\n some other text\n\n more text\n
How can I make something like this using sed or another command?
some text\n some other text\n more text\n
I can remove \n like sed s/\n//g but this will remove all the characters.
You can use
sed '/^$/d' file > newfile
In GNU sed, you can use inline replacement with -i option:
sed -i '/^$/d' file
In MacOS, FreeBSD sed inline replacement can be done with
sed -i '' '/^$/d' file
sed -i.bak '/^$/d' file
See the online demo:
#!/bin/bash
s=$(echo -e "some text\n\n\n some other text\n\n more text\n")
sed '/^$/d' <<< "$s"
Output:
some text
some other text
more text
You can also use tr if it supports squeezings.
$ echo -e 'ab\n\ncd' | tr --squeeze-repeats '\n'
ab
cd
Given the following [input] or a file that is similar:
printf "some text\n\n\n some other text\n\n more text\n" | [ one of the pipes below... ]
Any of these work:
[input] | sed -n '/[^[:space:]]/p'
Or:
[input] | sed '/^$/d'
Or, if you want to filter ^[spaces or tabs]\n also:
[input] | sed '/^[[:blank:]]*$/d'
Or with awk:
[input] | awk 'NF'

How to replace < and > symbols in a file with \< and \> respectively with sed inside Jenkins Groovy?

I have a file named body.txt which contains the following:
<table><tr><td>Hello</td><td>World</td></tr></table>
I want to put \ in front of each < and > so that the file body.txt contains the following:
\<table\>\<tr\>\<td\>Hello\</td\>\<td\>World\</td\>\</tr\>\</table\>
I am trying to do this from inside a Jenkins Groovy script.
I tried the following approaches:
Approach 1:
sh "sed -i 's/</\\</g' body.txt"
sh "sed -i 's/</\\</g' body.txt"
Approach 2:
sh '''
#!bin/bash
sed -i "s/</\\</g" body.txt
sed -i "s/>/\\>/g" body.txt
'''
Approach 3:
env.lt="<"
env.lts="\\<"
env.gt=">"
env.gts="\\>"
sh '''
#!bin/bash
sed -i "s/${lt}/${lts}/g" body.txt
sed -i "s/${gt}/${gts}/g" body.txt
'''
Approach 4:
env.lt="<"
env.lts="\\<"
env.gt=">"
env.gts="\\>"
sh "sed -i 's/${lt}/${lts}/g' body.txt"
sh "sed -i 's/${gt}/${gts}/g' body.txt"
Approach 5:
sh "cat body.txt |tr '<' '\\<' > body1.txt"
sh "cat body1.txt|tr '>' '\\>' > body2.txt"
sh "cp body2.txt body.txt"
sh "rm body1.txt body2.txt"
None of these approaches are working.
I am not getting any error, but replacement of < and > symbols are not happening.
I think this is what you want. You need to use gsub in awk to be able to perform the replacement. You also need to put two of \ when you need to replace e.g. > with \>.
cat body.txt | awk '{gsub (/>/,"\\>");print}' | awk '{gsub (/</,"\\<");print}'
Another way using sed :
cat body.txt | sed 's,>,\\>,g' | sed 's,<,\\<,g'
The below command worked for me that ran inside Jenkins Groovy declarative pipeline script:
#!/bin/bash
cat body.txt | awk '{gsub (/>/,"\\\\>");print}' | awk '{gsub (/</,"\\\\<");print}'

Search replace regular expression variable using sed

This is probably a trivial one:
I have a file (my.file) with these lines:
>h1_c1
>h1_c2
>h1_c3
>h2_c1
>h2_c2
>h2_c3
and I want to change it in place to be:
>c1_h1
>c2_h1
>c3_h1
>c1_h2
>c2_h2
>c3_h3
I thought this ought to do it:
sed -i 's/\(\>\)\(h1\)\(\_\)\(.*\)/\1 \4 \3 \2/g' my.file
sed -i 's/\(\>\)\(h2\)\(\_\)\(.*\)/\1 \4 \3 \2/g' my.file
but it doesn't seem to work. How do I do it?
The obvious sed for your example is:
$ sed -i~ -e 's/^>\(h[0-9]\)_\(c[0-9]\)/>\2_\1/' *.foo
I tested this and it works for your example file.
Try this awk
awk -F">|_" '{print ">"$3"_"$2}' my.file > tmp && mv tmp my.file
awk -F">|_" '{print ">"$3"_"$2}' my.file
>c1_h1
>c2_h1
>c3_h1
>c1_h2
>c2_h2
>c3_h2
You can try this sed,
sed 's/>\(h[1-2]\)_\(.*\)/>\2_\1/' yourfile
(OR)
sed -r 's/>(h[1-2])_(.*)/>\2_\1/' yourfile
kent$ sed -r 's/>([^_]*)_(.*)/>\2_\1/' f
>c1_h1
>c2_h1
>c3_h1
>c1_h2
>c2_h2
>c3_h2
you add -i if you want it to happen "in-place"

How to add new line using sed on MacOS?

I wanted to add a new line between </a> and <a><a>
</a><a><a>
</a>
<a><a>
I did this
sed 's#</a><a><a>#</a>\n<a><a>#g' filename but it didn't work.
Powered by mac in two Interpretation:
echo foo | sed 's/f/f\'$'\n/'
echo foo | gsed 's/f/f\n/g'
Some seds, notably Mac / BSD, don't interpret \n as a newline, you need to use an actual newline, preceded by a backslash:
$ echo foo | sed 's/f/f\n/'
fnoo
$ echo foo | sed 's/f/f\
> /'
f
oo
$
Or you can use:
echo foo | sed $'s/f/f\\\n/'
...or you just pound on it! worked for me on insert on mac / osx:
sed "2 i \\\n${TEXT}\n\n" -i ${FILE_PATH_NAME}
sed "2 i \\\nSomeText\n\n" -i textfile.txt

how to use result from a pipe in the next sed command?

I want to use sed to do this. I have 2 files:
keys.txt:
host1
host2
test.txt
host1 abc
host2 cdf
host3 abaasdf
I want to use sed to remove any lines in test.txt that contains the keyword in keys.txt. So the result of test.txt should be
host3 abaasdf
Can somebody show me how to do that with sed?
Thanks,
I'd recommend using grep for this (especially fgrep since there are no regexps involved), so
fgrep -v -f keys.txt test.txt
does it fine. With sed quickly this works:
sed -i.ORIGKEYS.txt ^-e 's_^_/_' -e 's_$_/d_' keys.txt
sed -f keys.txt test.txt
(This modifies the original keys.txt in place - with backup - to a sourceable sed script.)
fgrep -v -f is the best solution. Here are a couple of alternatives:
A combination of comm and join
comm -13 <(join keys.txt test.txt) test.txt
or awk
awk 'NR==FNR {key[$1]; next} $1 in key {next} 1' keys.txt test.txt
This might work for you (GNU sed):
sed 's|.*|/^&\\>/d|' keys.txt | sed -f - test.txt