SED command to remove words at the end of the string - sed

I want to remove last 2 words in the string which is in a file.
I am using this command first to delete the last word. But I couldn't do it. can someone help me
sed 's/\w*$//' <file name>
my strings are like this
Input:
asbc/jahsf/jhdsflk/jsfh/ -0.001 (exam)
I want to remove both numerical value and the one in brackets.
Output:
asbc/jahsf/jhdsflk/jsfh/

Using GNU sed:
$ sed -r 's/([[:space:]]+[-+.()[:alnum:]]+){2}$//' file
asbc/jahsf/jhdsflk/jsfh/
How it works
[[:space:]]+ matches one or more spaces.
[-+.()[:alnum:]]+ matches the 'words' which are allowed to contain any number of plus or minus signs, periods, parens, or any alphanumeric characters.
Note that, when a period is inside square brackets, [.], it is just a period, not a wildcard: it does not need to be escaped.
([[:space:]]+[-+.()[:alnum:]]+) matches one or more spaces followed by a word.
([[:space:]]+[-+.()[:alnum:]]+){2}$ matches two words and the spaces which precede them.
Note the use of character classes like [:space:] and [:alnum:]. Unlike the old-fashioned classes like [a-zA-Z0-9], these classes are unicode safe.
OSX (BSD) sed
The above was tested on GNU sed. For BSD sed, try:
sed -E 's/([[:space:]][[:space:]]*[-+.()[:alnum:][:alnum:]]*){2}$//' file
To remove everything that follows a number with decimal places
This looks for a decimal number with optional sign and removes it, the spaces which precede it, and everything which follows it:
$ sed -r 's/[[:space:]]+[-+]?[[:digit:]]+[.][[:digit:]]+[[:space:]].*//' file
asbc/jahsf/jhdsflk/jsfh/
How it works:
[[:space:]]+ matches one or more spaces
[-+]? matches zero or one signs.
[[:digit:]]+ matches any number of digits.
[.] matches a decimal point (period).
[[:digit:]]+ matches one or more digits following the decimal point.
[[:space:]] matches a space following the number.
.* matches anything which follows.

It looks like there is a tab between what you want to keep and what you want to get rid of. I don't have linux in front of me but try this.
sed 's/\t.*//'
This is assuming your strings are always formatted similarily which is what I take from your comment.

This might work for you (GNU sed):
sed -r 's/\s+\S+\s+\S+\s*$//' file
or if you prefer:
sed -r 's/(\s+\S+){2}\s*$//' file
This matches and removes: one or more whitespaces followed by one or more non-whitespaces twice followed by zero or more whitespaces at the end of the line.

Related

gnu sed remove portion of line after pattern match with special characters

The goal is to use sed to return only the url from each line of FF extension Mining Blocker which uses this format for its regex lines:
{"baseurl":"*://002.0x1f4b0.com/*", "suburl":"*://*/002.0x1f4b0.com/*"},
{"baseurl":"*://003.0x1f4b0.com/*", "suburl":"*://*/003.0x1f4b0.com/*"},
the result should be:
002.0x1f4b0.com
003.0x1f4b0.com
One way would be to keep everything after suburl":"*://*/ then remove each occurrence of /*"},
I found https://unix.stackexchange.com/questions/24140/return-only-the-portion-of-a-line-after-a-matching-pattern but the special characters are a problem.
this won't work:
sed -n -e s#^.*suburl":"*://*/##g hosts
Would someone please show me how to mark the 2 asterisks in the string so they are seen by regex as literal characters, not wildcards?
edit:
sed -n 's#.*://\*/\([^/]\+\)/.*#\1#p' hosts
doesn't work, unfortunately.
regarding character substitution, thanks for directing me to the references.
I reduced the searched-for string to //*/ and used ASCII character codes like this:
sed -n -e s#^.*\d047\d047\d042\d047##g hosts
Unfortunately, that didn't output any changes to the lines.
My assumptions are:
^.*something specifies everything up to and including the last occurrence of "something" in a line
sed -n -e s#search##g deletes (replace with nothing) "search" within a line
So, this line:
sed -n -e s#^.*\d047\d047\d042\d047##g hosts
Should output everything after //*/ in each line...except it doesn't.
What is incorrect with that line?
Regarding deleting everything including and after the first / AFTER that first operation, yes, that's wanted too.
This might work for you (GNU sed):
sed -n 's#.*://\*/\([^/]\+\)/.*#\1#p' file
Match greedily (the longest string that matches) all characters up to ://*/, followed by a group of characters (which will be referred to as \1) that do not match a /, followed by the rest of the line and replace it by the group \1.
N.B. the sed substitution delimiters are arbitrary, in this case chosen to be # so as make pattern matching / easier. Also the character * on the left hand side of the substitution command may be interpreted as a meta character that means zero or more of the previous character/group and so is quoted \* so that it does not mistakenly exert this property. Finally, using the option -n toggles off the usual printing of every thing in the pattern space after all the sed commands have been executed. The p flag on the substitution command, prints the pattern space following a successful substitution, therefore only URL's will appear in the output or nothing.

sed charachter to leave a match untouched

If I have
123456red100green
123456bee010yellow
123456usb110orange
123456sos011querty
123456let101bottle
and I want it to be
123456red111green
123456bee111yellow
123456usb111orange
123456sos111querty
123456let111bottle
notice: the first 6 characters don't change,,,,
the following 6 change,,,,
also these strings might be anywhere in a file (beginning, end, anywhere)
I want to specify sed to
1)find 123456
2)skip the next three characters
3)replace the next three with 111
The closest I've come to is:
sed '/s/123456....../123456...111/g'
I know dots mean anything but I don't know the equivalent on the other side. In short how to command sed to leave characters in a match untouched.
sorry for having been unclear of what I want please bear with me
Matching 123456 followed by three characters that are not to be modified, and then replacing the next three characters with 111:
sed 's/\(123456...\).../\1111/g' file
The \( ... \) captures the part of the string that we don't want to modify. These are re-inserted with \1. The whole matching bit of the line is replaced by "the bit in the \( ... \) (i.e. \1) followed by 111".
If you want to change each and every zero (as in your examples), then just sed 's/0/1/g' would do. Or sed -e '/^123456/ s/0/1/g' to do the same on lines starting with 123456.
But to count characters, as you ask, use ( .. ) to capture the varying parts and \1 to replace them (using sed -E). So:
echo 123456abcdefgh | sed -Ee 's/^(123456...).../\1111/'
outputs 123456abc111gh. The \1 puts back the part matched by 123456..., the next three ones are literal characters.
(Without -E, you'd need \( .. \) to group.)

Matching strings even if they start with white spaces in SED

I'm having issues matching strings even if they start with any number of white spaces. It's been very little time since I started using regular expressions, so I need some help
Here is an example. I have a file (file.txt) that contains two lines
#String1='Test One'
String1='Test Two'
Im trying to change the value for the second line, without affecting line 1 so I used this
sed -i "s|String1=.*$|String1='Test Three'|g"
This changes the values for both lines. How can I make sed change only the value of the second string?
Thank you
With gnu sed, you match spaces using \s, while other sed implementations usually work with the [[:space:]] character class. So, pick one of these:
sed 's/^\s*AWord/AnotherWord/'
sed 's/^[[:space:]]*AWord/AnotherWord/'
Since you're using -i, I assume GNU sed. Either way, you probably shouldn't retype your word, as that introduces the chance of a typo. I'd go with:
sed -i "s/^\(\s*String1=\).*/\1'New Value'/" file
Move the \s* outside of the parens if you don't want to preserve the leading whitespace.
There are a couple of solutions you could use to go about your problem
If you want to ignore lines that begin with a comment character such as '#' you could use something like this:
sed -i "/^\s*#/! s|String1=.*$|String1='Test Three'|g" file.txt
which will only operate on lines that do not match the regular expression /.../! that begins ^ with optional whiltespace\s* followed by an octothorp #
The other option is to include the characters before 'String' as part of the substitution. Doing it this way means you'll need to capture \(...\) the group to include it in the output with \1
sed -i "s|^\(\s*\)String1=.*$|\1String1='Test Four'|g" file.txt
With GNU sed, try:
sed -i "s|^\s*String1=.*$|String1='Test Three'|" file
or
sed -i "/^\s*String1=/s/=.*/='Test Three'/" file
Using awk you could do:
awk '/String1/ && f++ {$2="Test Three"}1' FS=\' OFS=\' file
#String1='Test One'
String1='Test Three'
It will ignore first hits of string1 since f is not true.

Extract CentOS mirror domain names using sed

I'm trying to extract a list of CentOS domain names only from http://mirrorlist.centos.org/?release=6.4&arch=x86_64&repo=os
Truncating prefix "http://" and "ftp://" to the first "/" character only resulting a list of
yum.phx.singlehop.com
mirror.nyi.net
bay.uchicago.edu
centos.mirror.constant.com
mirror.teklinks.com
centos.mirror.netriplex.com
centos.someimage.com
mirror.sanctuaryhost.com
mirrors.cat.pdx.edu
mirrors.tummy.com
I searched stackoverflow for the sed method but I'm still having trouble.
I tried doing this with sed
curl "http://mirrorlist.centos.org/?release=6.4&arch=x86_64&repo=os" | sed '/:\/\//,/\//p'
but doesn't look like it is doing anything. Can you give me some advice?
Here you go:
curl "http://mirrorlist.centos.org/?release=6.4&arch=x86_64&repo=os" | sed -e 's?.*://??' -e 's?/.*??'
Your sed was completely wrong:
/x/,/y/ is a range. It selects multiple lines, from a line matching /x/ until a line matching /y/
The p command prints the selected range
Since all lines match both the start and end pattern you used, you effectively selected all lines. And, since sed echoes the input by default, the p command results in duplicated lines (all lines printed twice).
In my fix:
I used s??? instead of s/// because this way I didn't need to escape all the / in the patterns, so it's a bit more readable this way
I used two expressions with the -e flag:
s?.*://?? matches everything up until :// and replaces it with nothing
s?/.*?? matches everything from / until the end replaces it with nothing
The two expressions are executed in the given order
In modern versions of sed you can omit -e and separate the two expressions with ;. I stick to using -e because it's more portable.

Insert newline after pattern with changing number in sed

I want to insert a newline after the following pattern
lcl|NC_005966.1_gene_750
While the last number(in this case the 750) changes. The numbers are in a range of 1-3407.
How can I tell sed to keep this pattern together and not split them after the first number?
So far i found
sed 's/lcl|NC_005966.1_gene_[[:digit:]]/&\n/g' file
But this breaks off, after the first digit.
Try:
sed 's/lcl|NC_005966.1_gene_[[:digit:]]*/&\n/g' file
(note the *)
Alternatively, you could say:
sed '/lcl|NC_005966.1_gene_[[:digit:]]/G' file
which would add a newline after the specified pattern is encountered.
sed 's/lcl|NC_005966\.1_gene_[[:digit:]][[:digit:]]*/&\
/g' file
You need to escape . as it's an RE metacharacter, and you need [[:digit:]][[:digit:]]* to represent 1-or-more digits and you need to use \ followed by a literal newline for portability across seds.