Sed uppercase lines if they starting with an uppercase character - sed

I want the lines starting with one uppercase character to be uppercased, other lines should be not touched.
So this input:
cat myfile
a
b
Cc
should result in this output:
a
b
CC
I tried this command, but this not matches if i use grouping:
cat myfile | sed -r 's/\([A-Z]+.*\)/\U\1/g'
What am i doing wrong?

When you use the -r option, you must not put \ before parentheses used for grouping. So it should be:
sed -r 's/^([A-Z].*)/\U\1/' myfile
Also, notice that you need ^ to match the beginning of the line. The g modifier isn't needed, since you're matching the entire line.

cat myfile | sed 's/^\([A-Z].*\)$/\U\1/'

\U for uppercase conversion is a GNU sed extension.
Alternative for platforms where that is not available (e.g., macOS, with its BSD awk implementation):
awk '/^[A-Z]/ { print toupper($0); next } 1'

sed '/^[A-Z].*[a-z]/ s/.*/\U\1/' YourFile
only on line that are not compliant

This might work for you (GNU sed):
sed 's/^[[:upper:]].*/\U&/' file

Related

How to replace only specific spaces in a file using sed?

I have this content in a file where I want to replace spaces at certain positions with pipe symbol (|). I used sed for this, but it is replacing all the spaces in the string. But I don't want to replace the space for the 3rd and 4th string.
How to achieve this?
Input:
test test test test
My attempt:
sed -e 's/ /|/g file.txt
Expected Output:
test|test|test test
Actual Output:
test|test|test|test
sed 's/ /\
/3;y/\n / |/'
As newline cannot appear in a sed pattern space, you can change the third space to a newline, then change all newlines and spaces to spaces and pipes.
GNU sed can use \n in the replacement text:
sed 's/ /\n/3;y/\n / |/'
If the original input doesn't contain any pipe characters, you can do
sed -e 's/ /|/g' -e 's/|/ /3' file
to retain the third white space. Otherwise see other answers.
You could replace the 'first space' twice, e.g.
sed -e 's/ /|/' -e 's/ /|/' file.txt
Or, if you want to specify the positions (e.g. the 2nd and 1st spaces):
sed -e 's/ /|/2' -e 's/ /|/1' file.txt
Using GNU sed to replace the first and second one or more whitespace chunks:
sed -i -E 's/\s+/|/;s/\s+/|/' file
See the online demo.
Details
-i - inline replacements on
-E - POSIX ERE syntax enabled
s/\s+/|/ - replaces the first one or more whitespace chars
; - and then
s/\s+/|/ the second one or more whitespace chars on each line (if present).
Keep it simple and use awk, e.g. using any awk in any shell on every Unix box no matter what other characters your input contains:
$ awk '{for (i=1;i<NF;i++) sub(/ /,"|")} 1' file
test|test|test test
The above replaces all but the last " " on each line. If you want to replace a specific number, e.g. 2, then just change NF to 2.

Delete line if string between the 4th and 5th delimiter is empty

"text";"text";"text";"text";;"text";"text"
If after the 4th delimiter the next one is following the line should be deleted.
Actually i'm doing that by using sed
sed -n '/;;/!p' input.txt
Is this a reliable solution?
Thanks for help.
Securing a bit potential escaped double quote and internal ";" (thanks #SLePort for remark)
sed -e 'h;s/\\"//g' -e ':c' -e 's/^\(\("[^"]*";\)*"[^"]*\);/\1/;t c' -e '/^\([^;]*;\)\{4\};/d;h'
sed -r '/^([^;]+;){4}\s*;/d' input.txt
awk -F';' '$5' input.txt
To remove lines containing ; after fourth delimiter:
sed '/^\("*[^"]*"*;\)\{4\};/d' input.txt
This might work for you (GNU sed):
sed -r '/^("(\\.|[^"])*";){4};/d' file
If the fourth grouping of double quotes followed by semi colon, where the characters within the grouping are either a pair of a quote and any other character or not a double quote, is followed by a further semi colon, then delete the line.
A more efficient regexp would be:
sed -r '/^("[^"\\]*(\\.[^"\\]*)*";){4};/d' file
This uses the pattern normal*(abnormal normal*)*

Combine -e and -n sed options

I'm trying to convert all occurrences of a certain letter in the header of a file to lowercase, i can achieve this with 2 sed lines but i would like to use one instead.
What i'm trying is this:
cat file.txt | sed -e 'n 1p' -e 's/U/u/g'
Supposing that the letter i want to replace is the 'u'
I feel like i'm very close to it but for some reasons i get sed to complain about an extra char after the 'n' command, but in this case -n needs a parameter, so there should be no reason to complain.
Any hint?
This might work for you (GNU sed):
sed '1y/U/u/' file
try this (GNU sed):
sed '1s/U/u/g' file

using sed for substitution in next line

I am working on sed command to translate some text into another text.
cat text
<strong>ABC
</strong>
Commnad:
sed -e 's|<strong>(.*?)</strong>|//textbf{1}|g'
Expected Outcome: \textbf{ABC}
but using above script i cannot convert it into expected output since there is new line between the tags. How to handle such cases?
This might work for you (GNU sed):
sed -r '$!N;s|(<)(strong>)([^\n]*)\n\s*\1/\2|//textbf{\3}|;P;D' file
or
sed '$!N;s|\(<\)\(strong>\)\([^\n]*\)\n\s*\1/\2|//textbf{\3}|;P;D' file
sed -e 'N;s|<strong>\(.*\?\)\n</strong>|\/textbf{\1}|g'
as said by CodeGnome and David Ravetti, the N flag allows for multi-line patterns.

How do I get rid of this unicode character?

Any idea how to get rid of this irritating character U+0092 from a bunch of text files? I've tried all the below but it doesn't work. It's called U+0092+control from the character map
sed -i 's/\xc2\x92//' *
sed -i 's/\u0092//' *
sed -i 's///' *
Ah, I've found a way:
CHARS=$(python2 -c 'print u"\u0092".encode("utf8")')
sed 's/['"$CHARS"']//g'
But is there a direct sed method for this?
Try sed "s/\`//g" *. (I added the g so it will remove all the backticks it finds).
EDIT: It's not a backtick that OP wants to remove.
Following the solution in this question, this ought to work:
sed 's/\xc2\x92//g'
To demonstrate it does:
$ CHARS=$(python -c 'print u"asdf\u0092asdf".encode("utf8")')
$ echo $CHARS
asdf<funny glyph symbol>asdf
$ echo $CHARS | sed 's/\xc2\x92//g'
asdfasdf
Seeing as it's something you tried already, perhaps what is in your text file is not U+0092?
This might work for you (GNU sed):
echo "string containing funny character(s)" | sed -n 'l0'
This will display the string as sed sees it in octal, then use:
echo "string containing funny character(s)" | sed 's/\onnn//g'
Where nnn is the octal value, to delete it/them.