Sed command to break comma separated string upto certain length - sed

Example string
TEST,TEST1,TEST3,TEST4,TEST5
Expected output :
TEST,TEST1,
TEST3,TEST4,
TEST5
I want to split data from comma before 15th position

Try this:
sed 's/.\{,15\},/&\n/g' <<< "string" # or
sed 's/.\{,15\},/&\n/g' file
.\{,15\}, matches a part of input consisting of 0 to 15 characters followed by a comma. since sed is greedy while matching patterns, it will match as much characters as it can.
&\n expands up to matched part followed by a line feed.
s/REGEXP/REPLACEMENT/g replaces every match against REGEXP with REPLACEMENT.

Related

How to replace a character using sed with different lengths in preceding string

I have a file in which I want to replace the "_" string with "-" in cases where it makes up a part of my gene name. Examples of the gene names and my intended output are:
aa1c1_123 -> aa1c1-123
aa1c2_456 -> aa1c1-456
aa1c10_789 -> aa1c1-789
In essence, the first four characters are fixed, followed by 1 or 2 characters depending on the chromosome, an underscore and then the remainder of the gene ID which could vary in length and character. Important is that there are other strings in this gene information column contains other strings with underscores (e.g. "gene_id", "transcript_id", "five_prime_utr") so using sed -i.bak s/_/-/g' file.gtf
can't be done.
Perhaps not the most elegant way, but this should work:
sed -i.bak 's/\([0-9a-z]\{4\}[0-9][0-9]\?\)_/\1-/g' file.gtf
i.e. capture a group (referenced by \1 in the substitution) of 4 characters consisting of lower case letters and digits followed by exactly one digit and perhaps another digit, which is followed by an underscore; if found, replace it by the group's content and a dash. This should exclude your other occurrences consisting of only characters and an underscore.

How to use Sed to change letter to uppercase in first and second column in text file to upper case

I have text file input.txt which has
april,december,month.gmail.com
lion,tiger,animal.gmail.com
Using sed change first and second columns to uppercase? Is there a way to do it?
With GNU sed:
sed 's/^[a-z]*,[a-z]*,/\U&/' file
s: substitute command
[a-z]*,: search for zero ore more lowercase letter followed by a ,. The pattern is repeated for second field
the \U sequence turns the replacement to uppercase
\U is applied to & which reference the matched string
or if there is only three comma separated fields:
sed 's/^[a-z].*,/\U&/' file
output:
APRIL,DECEMBER,month.gmail.com
LION,TIGER,animal.gmail.com
As #Sundeep suggests, the second sed can be shortened to:
s/^.*,/\U&/
which converts all characters until last , is found
For more on GNU sed substitution command, see this article

SED command to remove words at the end of the string

I want to remove last 2 words in the string which is in a file.
I am using this command first to delete the last word. But I couldn't do it. can someone help me
sed 's/\w*$//' <file name>
my strings are like this
Input:
asbc/jahsf/jhdsflk/jsfh/ -0.001 (exam)
I want to remove both numerical value and the one in brackets.
Output:
asbc/jahsf/jhdsflk/jsfh/
Using GNU sed:
$ sed -r 's/([[:space:]]+[-+.()[:alnum:]]+){2}$//' file
asbc/jahsf/jhdsflk/jsfh/
How it works
[[:space:]]+ matches one or more spaces.
[-+.()[:alnum:]]+ matches the 'words' which are allowed to contain any number of plus or minus signs, periods, parens, or any alphanumeric characters.
Note that, when a period is inside square brackets, [.], it is just a period, not a wildcard: it does not need to be escaped.
([[:space:]]+[-+.()[:alnum:]]+) matches one or more spaces followed by a word.
([[:space:]]+[-+.()[:alnum:]]+){2}$ matches two words and the spaces which precede them.
Note the use of character classes like [:space:] and [:alnum:]. Unlike the old-fashioned classes like [a-zA-Z0-9], these classes are unicode safe.
OSX (BSD) sed
The above was tested on GNU sed. For BSD sed, try:
sed -E 's/([[:space:]][[:space:]]*[-+.()[:alnum:][:alnum:]]*){2}$//' file
To remove everything that follows a number with decimal places
This looks for a decimal number with optional sign and removes it, the spaces which precede it, and everything which follows it:
$ sed -r 's/[[:space:]]+[-+]?[[:digit:]]+[.][[:digit:]]+[[:space:]].*//' file
asbc/jahsf/jhdsflk/jsfh/
How it works:
[[:space:]]+ matches one or more spaces
[-+]? matches zero or one signs.
[[:digit:]]+ matches any number of digits.
[.] matches a decimal point (period).
[[:digit:]]+ matches one or more digits following the decimal point.
[[:space:]] matches a space following the number.
.* matches anything which follows.
It looks like there is a tab between what you want to keep and what you want to get rid of. I don't have linux in front of me but try this.
sed 's/\t.*//'
This is assuming your strings are always formatted similarily which is what I take from your comment.
This might work for you (GNU sed):
sed -r 's/\s+\S+\s+\S+\s*$//' file
or if you prefer:
sed -r 's/(\s+\S+){2}\s*$//' file
This matches and removes: one or more whitespaces followed by one or more non-whitespaces twice followed by zero or more whitespaces at the end of the line.

Sed - Printing a pattern in a line matched more than once

Input-
X's Score 1725 and Y's Score 6248 in the match number 576
I want sed to ouput-
1725
6248
My code-
sed 's/Score[[:space:]]\([0-9]+\)/\1/g'
The above code outputs -
1725 and Y's 6248 in the match
You could try the following sed commands
#!/bin/sed f
s/Score\s*/\
/g
s/\n\([0-9]\+\)[^\n]*/\
\1/g
s/^[^\n]*\n//
The first command replaces all "Score"s with newlines, so now all numbers are at the beginning of a line. To insert a newline character, we must write a backslash followed by an actual line break. That's why the command spawns two lines.
The second command will remove everything after the numbers that are on the beginning of a line. It will match a newline character followed by a number (this is how we now that this number was prefixed by a "Score" string). The number will be captured into variable \1. Then it will skip all characters up to the newline character. When writing the replacement, we must restore the newline character and the number that was captured into \1.
Because the first line contains text before the first "Score", we must remove it. That's what the last command does, it matches all characters up to the first newline, starting from the beginning of the contents of the pattern space (ie. our working buffer).
In a single command:
sed -e 's/Score\s*/\
/g;s/\n\([0-9]\+\)[^\n]*/\
\1/g;s/^[^\n]*\n//'
Hope this helps =)
One way using GNU sed because \b that matches a word boundary is an extension.
echo "X's Score 1725 and Y's Score 6248 in the match number 576" | sed -e '
## Surround searched numbers (preceded by "Score") with newline characters.
s/\bScore \([0-9]\+\)\b/\n\1\n/g;
## Delete all numbers not preceded by a newline character.
s/\([^\n0-9]\)[0-9]\+/\1/g;
## Remove all other characters but numbers and newlines.
s/[^0-9\n]\+//g;
## Remove extra newlines.
s/\n\([0-9]\)/\1/g;
s/\n$//
' infile
It yields:
1725
6248
You could AND two egreps:
<infile egrep -o 'Score [0-9]+' | egrep -o '[0-9]+$'

sed: change word order and replace

I'm trying to replace;
randomtext{{XX icon}}
by
randomtext{{ref-XX}}
..in a file, where XX could be any sequence of 2 or 3 lowercase letters.
I attempted rearranging the word order with awk before replacing "icon" with "ref-" with sed;
awk '{print $2, $1}'
..but since there is no space before the first word nor after the second one, it messed up the curly brackets;
icon}} {{XX
What is the simplest way to achieve this using sed?
sed 's/{{\([a-z]\{2,3\}\)\sicon/{{ref-\1/'
This one liner uses the substitute command s/PATTERN/REPLACE/. {{ matches two brackets. \([a-z]\{2,3\}\) captures the pattern that matches 2 or 3 lowercase letters. \s matches a white space. icon matches the literal string "icon". Then we replace the match, that is, {{....icon with the literal string {{ref- and the captured 2 or 3 letter word.
Here's a more generic version using hash tags (#) as regex delimiter:
sed 's#{{\([^ ]*\) [^}]*#{{ref-\1#'
{{ anchors the regex at the double open curly braces.
\([^ ]*\) captures everything up until a space.
[^}]* eats everything up until a closing curly brace.