I have text file input.txt which has
april,december,month.gmail.com
lion,tiger,animal.gmail.com
Using sed change first and second columns to uppercase? Is there a way to do it?
With GNU sed:
sed 's/^[a-z]*,[a-z]*,/\U&/' file
s: substitute command
[a-z]*,: search for zero ore more lowercase letter followed by a ,. The pattern is repeated for second field
the \U sequence turns the replacement to uppercase
\U is applied to & which reference the matched string
or if there is only three comma separated fields:
sed 's/^[a-z].*,/\U&/' file
output:
APRIL,DECEMBER,month.gmail.com
LION,TIGER,animal.gmail.com
As #Sundeep suggests, the second sed can be shortened to:
s/^.*,/\U&/
which converts all characters until last , is found
For more on GNU sed substitution command, see this article
Related
I have a file that contains a whitespace character that I'm not able to successfully remove with command-line tools such as tr or sed. Here's the input:
2, 78 ,, 1
6, 74, ,1
and I want the output to look like:
2,78,,1
6,74,,1
Attempts
If I try tr -d "[[:space:]] the result is 2, 78,,16,74,,1 which leaves a space character and removes the newline.
If I try sed 's/[[:space:]]//g' the result is
2, 78,,1
6,74,,1
which still leaves the space.
I converted the string to hex, and it seems the offending character is a0, but even then the results are not what I'd expect:
sed 's/\xa0//g' yields
2, �78 ,, 1
6, 74, ,1
Question
What is that whitespace character that is not getting caught by the [[:space:]] character class? How can I delete it?
The offending character is a UTF-8-encoded non-breaking space, with hex representation \xc2\xa0. You can remove all spaces, including non-breaking spaces, with
sed -E 's/[[:space:]]|\xc2\xa0//g'
Explanation
-E turns on extended regex to allow the | to represent logical OR
's/pattern/replacement/' substitutes pattern matches with the replacement text (in this case, an empty string), with /g repeating the pattern substitution multiple times per line
[[:space:]] matches most whitespace characters, including spaces and tabs
\xc2\xa0 is the hex code for the UTF-8 non-breaking space
The characters you want to remove are the non-printable ones (i.e the ones not in the [:print:] character class) rather than the ones just the ones in the [:space:] character class:
$ printf 'foo\xc2\xa0bar\n' > file
$ cat file
foo bar
$ tr -dc '[:print:]' < file
foobar$
but I notice the equivalent doesn't work in GNU sed or GNU awk and idk why.
Example string
TEST,TEST1,TEST3,TEST4,TEST5
Expected output :
TEST,TEST1,
TEST3,TEST4,
TEST5
I want to split data from comma before 15th position
Try this:
sed 's/.\{,15\},/&\n/g' <<< "string" # or
sed 's/.\{,15\},/&\n/g' file
.\{,15\}, matches a part of input consisting of 0 to 15 characters followed by a comma. since sed is greedy while matching patterns, it will match as much characters as it can.
&\n expands up to matched part followed by a line feed.
s/REGEXP/REPLACEMENT/g replaces every match against REGEXP with REPLACEMENT.
I am working on trying to duplicate characters on certain words, but the sed script I wrote is not quite doing what I want it to. For example, I am trying to duplicate each character in a words like so:
FILE into FFIILLEE
I know how to remove the duplicate letters with :
sed s/\(.\)\1/\1/g' file.txt
This is the sed script I wrote, but it just ends up duplicating the whole word and I just get:
FILE FILE
This is what I wrote:
sed 's/[A-Z]*/& &/g' file.txt
How can I grab each letter and duplicate just the letter?
A slight variation on your first script should work:
sed 's/\(.\)/\1\1/g' file.txt
Translation: For every character seen, replace it by itself followed by itself.
sed 's/[[:alpha:]]/&&/g' file.txt
[:alpha:]class is the whole scope of letter available, you could extend with [:alnum:]including digit
& in replacement pattern is the whole search pattern matching. In this case 1 letter
g for each possible occurence
Your probleme was to use the * in search pattern that mean all occurence of previous pattern so the pattern is the whole word at once and not every letter of this word
I want to remove last 2 words in the string which is in a file.
I am using this command first to delete the last word. But I couldn't do it. can someone help me
sed 's/\w*$//' <file name>
my strings are like this
Input:
asbc/jahsf/jhdsflk/jsfh/ -0.001 (exam)
I want to remove both numerical value and the one in brackets.
Output:
asbc/jahsf/jhdsflk/jsfh/
Using GNU sed:
$ sed -r 's/([[:space:]]+[-+.()[:alnum:]]+){2}$//' file
asbc/jahsf/jhdsflk/jsfh/
How it works
[[:space:]]+ matches one or more spaces.
[-+.()[:alnum:]]+ matches the 'words' which are allowed to contain any number of plus or minus signs, periods, parens, or any alphanumeric characters.
Note that, when a period is inside square brackets, [.], it is just a period, not a wildcard: it does not need to be escaped.
([[:space:]]+[-+.()[:alnum:]]+) matches one or more spaces followed by a word.
([[:space:]]+[-+.()[:alnum:]]+){2}$ matches two words and the spaces which precede them.
Note the use of character classes like [:space:] and [:alnum:]. Unlike the old-fashioned classes like [a-zA-Z0-9], these classes are unicode safe.
OSX (BSD) sed
The above was tested on GNU sed. For BSD sed, try:
sed -E 's/([[:space:]][[:space:]]*[-+.()[:alnum:][:alnum:]]*){2}$//' file
To remove everything that follows a number with decimal places
This looks for a decimal number with optional sign and removes it, the spaces which precede it, and everything which follows it:
$ sed -r 's/[[:space:]]+[-+]?[[:digit:]]+[.][[:digit:]]+[[:space:]].*//' file
asbc/jahsf/jhdsflk/jsfh/
How it works:
[[:space:]]+ matches one or more spaces
[-+]? matches zero or one signs.
[[:digit:]]+ matches any number of digits.
[.] matches a decimal point (period).
[[:digit:]]+ matches one or more digits following the decimal point.
[[:space:]] matches a space following the number.
.* matches anything which follows.
It looks like there is a tab between what you want to keep and what you want to get rid of. I don't have linux in front of me but try this.
sed 's/\t.*//'
This is assuming your strings are always formatted similarily which is what I take from your comment.
This might work for you (GNU sed):
sed -r 's/\s+\S+\s+\S+\s*$//' file
or if you prefer:
sed -r 's/(\s+\S+){2}\s*$//' file
This matches and removes: one or more whitespaces followed by one or more non-whitespaces twice followed by zero or more whitespaces at the end of the line.
I want to insert a newline after the following pattern
lcl|NC_005966.1_gene_750
While the last number(in this case the 750) changes. The numbers are in a range of 1-3407.
How can I tell sed to keep this pattern together and not split them after the first number?
So far i found
sed 's/lcl|NC_005966.1_gene_[[:digit:]]/&\n/g' file
But this breaks off, after the first digit.
Try:
sed 's/lcl|NC_005966.1_gene_[[:digit:]]*/&\n/g' file
(note the *)
Alternatively, you could say:
sed '/lcl|NC_005966.1_gene_[[:digit:]]/G' file
which would add a newline after the specified pattern is encountered.
sed 's/lcl|NC_005966\.1_gene_[[:digit:]][[:digit:]]*/&\
/g' file
You need to escape . as it's an RE metacharacter, and you need [[:digit:]][[:digit:]]* to represent 1-or-more digits and you need to use \ followed by a literal newline for portability across seds.