How to use Sed to change letter to uppercase in first and second column in text file to upper case - sed

I have text file input.txt which has
april,december,month.gmail.com
lion,tiger,animal.gmail.com
Using sed change first and second columns to uppercase? Is there a way to do it?

With GNU sed:
sed 's/^[a-z]*,[a-z]*,/\U&/' file
s: substitute command
[a-z]*,: search for zero ore more lowercase letter followed by a ,. The pattern is repeated for second field
the \U sequence turns the replacement to uppercase
\U is applied to & which reference the matched string
or if there is only three comma separated fields:
sed 's/^[a-z].*,/\U&/' file
output:
APRIL,DECEMBER,month.gmail.com
LION,TIGER,animal.gmail.com
As #Sundeep suggests, the second sed can be shortened to:
s/^.*,/\U&/
which converts all characters until last , is found
For more on GNU sed substitution command, see this article

Related

Unable to delete whitespace from string with tr, sed

I have a file that contains a whitespace character that I'm not able to successfully remove with command-line tools such as tr or sed. Here's the input:
2,  78 ,, 1
6, 74, ,1
and I want the output to look like:
2,78,,1
6,74,,1
Attempts
If I try tr -d "[[:space:]] the result is 2, 78,,16,74,,1 which leaves a space character and removes the newline.
If I try sed 's/[[:space:]]//g' the result is
2, 78,,1
6,74,,1
which still leaves the space.
I converted the string to hex, and it seems the offending character is a0, but even then the results are not what I'd expect:
sed 's/\xa0//g' yields
2, �78 ,, 1
6, 74, ,1
Question
What is that whitespace character that is not getting caught by the [[:space:]] character class? How can I delete it?
The offending character is a UTF-8-encoded non-breaking space, with hex representation \xc2\xa0. You can remove all spaces, including non-breaking spaces, with
sed -E 's/[[:space:]]|\xc2\xa0//g'
Explanation
-E turns on extended regex to allow the | to represent logical OR
's/pattern/replacement/' substitutes pattern matches with the replacement text (in this case, an empty string), with /g repeating the pattern substitution multiple times per line
[[:space:]] matches most whitespace characters, including spaces and tabs
\xc2\xa0 is the hex code for the UTF-8 non-breaking space
The characters you want to remove are the non-printable ones (i.e the ones not in the [:print:] character class) rather than the ones just the ones in the [:space:] character class:
$ printf 'foo\xc2\xa0bar\n' > file
$ cat file
foo bar
$ tr -dc '[:print:]' < file
foobar$
but I notice the equivalent doesn't work in GNU sed or GNU awk and idk why.

Sed command to break comma separated string upto certain length

Example string
TEST,TEST1,TEST3,TEST4,TEST5
Expected output :
TEST,TEST1,
TEST3,TEST4,
TEST5
I want to split data from comma before 15th position
Try this:
sed 's/.\{,15\},/&\n/g' <<< "string" # or
sed 's/.\{,15\},/&\n/g' file
.\{,15\}, matches a part of input consisting of 0 to 15 characters followed by a comma. since sed is greedy while matching patterns, it will match as much characters as it can.
&\n expands up to matched part followed by a line feed.
s/REGEXP/REPLACEMENT/g replaces every match against REGEXP with REPLACEMENT.

Duplicating characters using sed

I am working on trying to duplicate characters on certain words, but the sed script I wrote is not quite doing what I want it to. For example, I am trying to duplicate each character in a words like so:
FILE into FFIILLEE
I know how to remove the duplicate letters with :
sed s/\(.\)\1/\1/g' file.txt
This is the sed script I wrote, but it just ends up duplicating the whole word and I just get:
FILE FILE
This is what I wrote:
sed 's/[A-Z]*/& &/g' file.txt
How can I grab each letter and duplicate just the letter?
A slight variation on your first script should work:
sed 's/\(.\)/\1\1/g' file.txt
Translation: For every character seen, replace it by itself followed by itself.
sed 's/[[:alpha:]]/&&/g' file.txt
[:alpha:]class is the whole scope of letter available, you could extend with [:alnum:]including digit
& in replacement pattern is the whole search pattern matching. In this case 1 letter
g for each possible occurence
Your probleme was to use the * in search pattern that mean all occurence of previous pattern so the pattern is the whole word at once and not every letter of this word

SED command to remove words at the end of the string

I want to remove last 2 words in the string which is in a file.
I am using this command first to delete the last word. But I couldn't do it. can someone help me
sed 's/\w*$//' <file name>
my strings are like this
Input:
asbc/jahsf/jhdsflk/jsfh/ -0.001 (exam)
I want to remove both numerical value and the one in brackets.
Output:
asbc/jahsf/jhdsflk/jsfh/
Using GNU sed:
$ sed -r 's/([[:space:]]+[-+.()[:alnum:]]+){2}$//' file
asbc/jahsf/jhdsflk/jsfh/
How it works
[[:space:]]+ matches one or more spaces.
[-+.()[:alnum:]]+ matches the 'words' which are allowed to contain any number of plus or minus signs, periods, parens, or any alphanumeric characters.
Note that, when a period is inside square brackets, [.], it is just a period, not a wildcard: it does not need to be escaped.
([[:space:]]+[-+.()[:alnum:]]+) matches one or more spaces followed by a word.
([[:space:]]+[-+.()[:alnum:]]+){2}$ matches two words and the spaces which precede them.
Note the use of character classes like [:space:] and [:alnum:]. Unlike the old-fashioned classes like [a-zA-Z0-9], these classes are unicode safe.
OSX (BSD) sed
The above was tested on GNU sed. For BSD sed, try:
sed -E 's/([[:space:]][[:space:]]*[-+.()[:alnum:][:alnum:]]*){2}$//' file
To remove everything that follows a number with decimal places
This looks for a decimal number with optional sign and removes it, the spaces which precede it, and everything which follows it:
$ sed -r 's/[[:space:]]+[-+]?[[:digit:]]+[.][[:digit:]]+[[:space:]].*//' file
asbc/jahsf/jhdsflk/jsfh/
How it works:
[[:space:]]+ matches one or more spaces
[-+]? matches zero or one signs.
[[:digit:]]+ matches any number of digits.
[.] matches a decimal point (period).
[[:digit:]]+ matches one or more digits following the decimal point.
[[:space:]] matches a space following the number.
.* matches anything which follows.
It looks like there is a tab between what you want to keep and what you want to get rid of. I don't have linux in front of me but try this.
sed 's/\t.*//'
This is assuming your strings are always formatted similarily which is what I take from your comment.
This might work for you (GNU sed):
sed -r 's/\s+\S+\s+\S+\s*$//' file
or if you prefer:
sed -r 's/(\s+\S+){2}\s*$//' file
This matches and removes: one or more whitespaces followed by one or more non-whitespaces twice followed by zero or more whitespaces at the end of the line.

Insert newline after pattern with changing number in sed

I want to insert a newline after the following pattern
lcl|NC_005966.1_gene_750
While the last number(in this case the 750) changes. The numbers are in a range of 1-3407.
How can I tell sed to keep this pattern together and not split them after the first number?
So far i found
sed 's/lcl|NC_005966.1_gene_[[:digit:]]/&\n/g' file
But this breaks off, after the first digit.
Try:
sed 's/lcl|NC_005966.1_gene_[[:digit:]]*/&\n/g' file
(note the *)
Alternatively, you could say:
sed '/lcl|NC_005966.1_gene_[[:digit:]]/G' file
which would add a newline after the specified pattern is encountered.
sed 's/lcl|NC_005966\.1_gene_[[:digit:]][[:digit:]]*/&\
/g' file
You need to escape . as it's an RE metacharacter, and you need [[:digit:]][[:digit:]]* to represent 1-or-more digits and you need to use \ followed by a literal newline for portability across seds.