Replace a string with sed script - sed

Input:
Proc Natl Acad Sci U S A. 2014 May 27;111(21):7819-24. doi: 10.1073/pnas.1400586111. Epub 2014 May 13.
Desired output:
Proc Natl Acad Sci U S A. 2014 May 27;111(21):7819-24.
What I tried:
sed 's/doi: *//'

Use
sed 's/doi: .*//'
In the pattern you tried, the * applies to the space before it, so doi: followed by an arbitrary number of spaces is removed, and what comes after that remains.
.*, by contrast, matches an arbitrary number of arbitrary characters (because . in a regex matches any character), and doi: .* matches doi: followed by a space and then all characters until the end of the line.

Related

Using sed to replace a number located between two other numbers

I need to replace a numeric value, that occurs in a specific line of a series of config files in a pattern like this:
string number_1 number_to_replace number_2
I want to obtain something like this:
string number_1 number_replaced number_2
The difficulties I encountered are:
number_1 or number_2 can be equal to number_to_replace, so a simple replacement is not possible.
number_1 and number_2 vary between config files so I don't know them in advance.
The closest attempt I got until now is:
echo "field 4 4 4" | sed 's/\s4\s/3/'
Which ouputs:
field34 4
This is close, given that I want to replace the intermediate number I added another "\s" to try to use the known fact that the line starts with a character.
echo "field 4 4 4" | sed 's/\s\s4\s/3/'
Which gives:
field 4 4 4
So, nothing is replaced this time. How can I proceed? A somewhat detailed explanation would be ideal, because my knowledge of replacing expressions that involve patterns in nearly zero.
Thanks.
You can do something like below, which matches your exact sequence of digits as in the example. You could replace 3 with any digit of your choice.
sed 's/\([0-9]\{1,\}\)[[:space:]]\([0-9]\{1,\}\)[[:space:]]\([0-9]\{1,\}\)/\1 3 \3/'
Notice that I've used the POSIX bracket expression to match the whitespace character which should be supported in any variant of sed you are using. Note that \s is supported in only the GNU variants.
The literal meaning of the regex definition is to match a single digit followed by a space, then a digit and space and another digit. The captured groups are stored from \1. Since your intention is to remove the 2nd digit, you replace that with the word of your choice.
If the extra escapes causes it unreadable, use the -E flag for extended regex support. I've used the default BRE version

Invalid reference \1 using sed when trying to print matching expression

Before I start, I already looked at this question, but it seems the solution was that they were not escaping the parentheses in their regex. I'm getting the same error, but I'm not grouping a regex. What I want to do is find all names/usernames in a lastlog file and return the UNs ONLY.
What I have:
s/^[a-z]+ |^[a-z]+[0-9]+/\1/p
I've seen many solutions that show how to do it in awk, which is great for future reference, but I want to do it using sed.
Edit for example input:
dzhu pts/15 n0000d174.cs.uts Wed Feb 17 08:31:22 -0600 2016
krobbins **Never logged in**
js24 **Never logged in**
You cannot use backreferences (such as \1) if you do not have any capture groups in the first part of your substitution command.
Assuming you want the first word in the line, here's a command you can run:
sed -n 's/^\s*\(\w\+\)\s\?.*/\1/p'
Explanation:
-n suppresses the default behavior of sed to print each line it processes
^\s* matches the start of the line followed by any number of whitespace
\(\w\+\) captures one or more word characters (letters and numbers)
\s\?.* matches one or zero spaces, followed by any number of characters. This is to make sure we match the whole word in the capture group
\1 replaces the matched line with the captured group
The p flag prints lines that matched the expression. Combined with -n, this means only matches get printed out.
I hope this helps!

SED command to remove words at the end of the string

I want to remove last 2 words in the string which is in a file.
I am using this command first to delete the last word. But I couldn't do it. can someone help me
sed 's/\w*$//' <file name>
my strings are like this
Input:
asbc/jahsf/jhdsflk/jsfh/ -0.001 (exam)
I want to remove both numerical value and the one in brackets.
Output:
asbc/jahsf/jhdsflk/jsfh/
Using GNU sed:
$ sed -r 's/([[:space:]]+[-+.()[:alnum:]]+){2}$//' file
asbc/jahsf/jhdsflk/jsfh/
How it works
[[:space:]]+ matches one or more spaces.
[-+.()[:alnum:]]+ matches the 'words' which are allowed to contain any number of plus or minus signs, periods, parens, or any alphanumeric characters.
Note that, when a period is inside square brackets, [.], it is just a period, not a wildcard: it does not need to be escaped.
([[:space:]]+[-+.()[:alnum:]]+) matches one or more spaces followed by a word.
([[:space:]]+[-+.()[:alnum:]]+){2}$ matches two words and the spaces which precede them.
Note the use of character classes like [:space:] and [:alnum:]. Unlike the old-fashioned classes like [a-zA-Z0-9], these classes are unicode safe.
OSX (BSD) sed
The above was tested on GNU sed. For BSD sed, try:
sed -E 's/([[:space:]][[:space:]]*[-+.()[:alnum:][:alnum:]]*){2}$//' file
To remove everything that follows a number with decimal places
This looks for a decimal number with optional sign and removes it, the spaces which precede it, and everything which follows it:
$ sed -r 's/[[:space:]]+[-+]?[[:digit:]]+[.][[:digit:]]+[[:space:]].*//' file
asbc/jahsf/jhdsflk/jsfh/
How it works:
[[:space:]]+ matches one or more spaces
[-+]? matches zero or one signs.
[[:digit:]]+ matches any number of digits.
[.] matches a decimal point (period).
[[:digit:]]+ matches one or more digits following the decimal point.
[[:space:]] matches a space following the number.
.* matches anything which follows.
It looks like there is a tab between what you want to keep and what you want to get rid of. I don't have linux in front of me but try this.
sed 's/\t.*//'
This is assuming your strings are always formatted similarily which is what I take from your comment.
This might work for you (GNU sed):
sed -r 's/\s+\S+\s+\S+\s*$//' file
or if you prefer:
sed -r 's/(\s+\S+){2}\s*$//' file
This matches and removes: one or more whitespaces followed by one or more non-whitespaces twice followed by zero or more whitespaces at the end of the line.

Replace particular string at fixed position using sed

I have a simple text file containing several lines of data where each line has got exactly 26 characters.
E.g.
10001340100491938001945591
10001340100491951002049591
10001340100462055002108507
10001340100492124002135591
10001340100492145002156507
10001340100472204002205591
Now, I want to replace 12th and 13th character if only these two characters are 49 with characters 58.
I tried it like this:
sed 's/^(.{12})49/\58/' data.txt
but am getting this error:
sed: -e expression #1, char 18: invalid reference \5 on `s' command's RHS
Thanks in advance
The captured group is \1, so you want to put the 11 (not 12) characters, then 58:
sed -E 's/^(.{11})49/\158/' data.txt
You also need -E or -r if you don't want to escape square and curly brackets.
With your input, the command changes 49 to 58 in lines 1, 2, 4 and 5.
If you like to try an awk solution:
awk 'substr($0,12,2)==49 {$0=substr($0,1,11)"58"substr($0,14)}1' file
10001340100581938001945591
10001340100581951002049591
10001340100462055002108507
10001340100582124002135591
10001340100582145002156507
10001340100472204002205591
In your expression you are replacing the first 14 characters (if you got it right). But you need to replace 11 plus the two (12th, 13th). More importantly, sed is fussy about escaping brackets, so you need backslashes in front of \( and { etc. Finally - the number of the capture group is 1. You omitted that number.
Putting it all together, you get
sed 's/^\(.\{11\}\)49/\158/'
There are flags you can use in sed to make the expressions more "regular" (changing the meaning of ( vs \(). What flag that is depends on your platform (version of sed), I believe.

Sed - Printing a pattern in a line matched more than once

Input-
X's Score 1725 and Y's Score 6248 in the match number 576
I want sed to ouput-
1725
6248
My code-
sed 's/Score[[:space:]]\([0-9]+\)/\1/g'
The above code outputs -
1725 and Y's 6248 in the match
You could try the following sed commands
#!/bin/sed f
s/Score\s*/\
/g
s/\n\([0-9]\+\)[^\n]*/\
\1/g
s/^[^\n]*\n//
The first command replaces all "Score"s with newlines, so now all numbers are at the beginning of a line. To insert a newline character, we must write a backslash followed by an actual line break. That's why the command spawns two lines.
The second command will remove everything after the numbers that are on the beginning of a line. It will match a newline character followed by a number (this is how we now that this number was prefixed by a "Score" string). The number will be captured into variable \1. Then it will skip all characters up to the newline character. When writing the replacement, we must restore the newline character and the number that was captured into \1.
Because the first line contains text before the first "Score", we must remove it. That's what the last command does, it matches all characters up to the first newline, starting from the beginning of the contents of the pattern space (ie. our working buffer).
In a single command:
sed -e 's/Score\s*/\
/g;s/\n\([0-9]\+\)[^\n]*/\
\1/g;s/^[^\n]*\n//'
Hope this helps =)
One way using GNU sed because \b that matches a word boundary is an extension.
echo "X's Score 1725 and Y's Score 6248 in the match number 576" | sed -e '
## Surround searched numbers (preceded by "Score") with newline characters.
s/\bScore \([0-9]\+\)\b/\n\1\n/g;
## Delete all numbers not preceded by a newline character.
s/\([^\n0-9]\)[0-9]\+/\1/g;
## Remove all other characters but numbers and newlines.
s/[^0-9\n]\+//g;
## Remove extra newlines.
s/\n\([0-9]\)/\1/g;
s/\n$//
' infile
It yields:
1725
6248
You could AND two egreps:
<infile egrep -o 'Score [0-9]+' | egrep -o '[0-9]+$'