sed backreference not being found - sed

I am trying to use 'sed' to replace a list of paths in a file with another path.
An example string to process is:
/path/to/file/block
I want to replace /path/to/file with something else.
I have Tried
sed -r '/\s(\S+)\/block/s/\1/new_path/'
I know it's finding the matching string but I'm getting an invalid back reference error.
How can I do this?

This may do:
echo "/path/to/file/block" | sed -r 's|/\S*/(block)|/newpath/\1|'
/newpath/block
Test
echo "test=/path/file test2=/path/to/file/block test3=/home/root/file" | sed -r 's|/\S*/(block)|/newpath/\1|'
test=/path/file test2=/newpath/block test3=/home/root/file

Back-references always refer to the pattern of the s command, not to any address (before the command).
However, in this case, there's no need for addressing: we can apply the substitution to all lines (and it will change only lines where it matches), so we can write:
s,\s(\S+)/block/, \1/new_path,
(I added a space to the RHS, as I'm guessing you didn't mean to overwrite that; also used a different separator to reduce the need for backslashes.)

Related

Manipulate characters with sed

I have a list of usernames and i would like add possible combinations to it.
Example. Lets say this is the list I have
johna
maryb
charlesc
Is there is a way to use sed to edit it the way it looks like
ajohn
bmary
ccharles
And also
john_a
mary_b
charles_c
etc...
Can anyone assist me into getting the commands to do so, any explanation will be awesome as well. I would like to understand how it works if possible. I usually get confused when I see things like 's/\.(.*.... without knowing what some of those mean... anyway thanks in advance.
EDIT ... I change the username
sed s/\(user\)\(.\)/\2\1/
Breakdown:
sed s/string/replacement/ will replace all instances of string with replacement.
Then, string in that sed expression is \(user\)\(.\). This can be broken down into two
parts: \(user\) and \(.\). Each of these is a capture group - bracketed by \( \). That means that once we've matched something with them, we can reuse it in the replacement string.
\(user\) matches, surprisingly enough, the user part of the string. \(.\) matches any single character - that's what the . means. Then, you have two captured groups - user and a (or b or c).
The replacement part just uses these to recreate the pattern a little differently. \2\1 says "print the second capture group, then the first capture group". Which in this case, will print out auser - since we matched user and a with each group.
ex:
$ echo "usera
> userb
> userc" | sed "s/\(user\)\(.\)/\2\1/"
auser
buser
cuser
You can change the \2\1 to use any string you want - ie. \2_\1 will give a_user, b_user, c_user.
Also, in order to match any preceding string (not just "user"), just replace the \(user\) with \(.*\). Ex:
$ echo "marya
> johnb
> alfredc" | sed "s/\(.*\)\(.\)/\2\1/"
amary
bjohn
calfred
here's a partial answer to what is probably the easy part. To use sed to change usera to user_a you could use:
sed 's/user/user_/' temp
where temp is the name of the file that contains your initial list of usernames. How this works: It is finding the first instance of "user" on each line and replacing it with "user_"
Similarly for your dot example:
sed 's/user/user./' temp
will replace the first instance of "user" on each line with "user."
Sed does not offer non-greedy regex, so I suggest perl:
perl -pe 's/(.*?)(.)$/$2$1/g' file
ajohn
bmary
ccharles
perl -pe 's/(.*?)(.)$/$1_$2/g' file
john_a
mary_b
charles_c
That way you don't need to know the username before hand.
Simple solution using awk
awk '{a=$NF;$NF="";$0=a$0}1' FS="" OFS="" file
ajohn
bmary
ccharles
and
awk '{a=$NF;$NF="";$0=$0"_" a}1' FS="" OFS="" file
john_a
mary_b
charles_c
By setting FS to nothing, every letter is a field in awk. You can then easy manipulate it.
And no need to using capturing groups etc, just plain field swapping.
This might work for you (GNU sed):
sed -r 's/^([^_]*)_?(.)$/\2\1/' file
This matches any charactes other than underscores (in the first back reference (\1)), a possible underscore and the last character (in the second back reference (\2)) and swaps them around.

Matching strings even if they start with white spaces in SED

I'm having issues matching strings even if they start with any number of white spaces. It's been very little time since I started using regular expressions, so I need some help
Here is an example. I have a file (file.txt) that contains two lines
#String1='Test One'
String1='Test Two'
Im trying to change the value for the second line, without affecting line 1 so I used this
sed -i "s|String1=.*$|String1='Test Three'|g"
This changes the values for both lines. How can I make sed change only the value of the second string?
Thank you
With gnu sed, you match spaces using \s, while other sed implementations usually work with the [[:space:]] character class. So, pick one of these:
sed 's/^\s*AWord/AnotherWord/'
sed 's/^[[:space:]]*AWord/AnotherWord/'
Since you're using -i, I assume GNU sed. Either way, you probably shouldn't retype your word, as that introduces the chance of a typo. I'd go with:
sed -i "s/^\(\s*String1=\).*/\1'New Value'/" file
Move the \s* outside of the parens if you don't want to preserve the leading whitespace.
There are a couple of solutions you could use to go about your problem
If you want to ignore lines that begin with a comment character such as '#' you could use something like this:
sed -i "/^\s*#/! s|String1=.*$|String1='Test Three'|g" file.txt
which will only operate on lines that do not match the regular expression /.../! that begins ^ with optional whiltespace\s* followed by an octothorp #
The other option is to include the characters before 'String' as part of the substitution. Doing it this way means you'll need to capture \(...\) the group to include it in the output with \1
sed -i "s|^\(\s*\)String1=.*$|\1String1='Test Four'|g" file.txt
With GNU sed, try:
sed -i "s|^\s*String1=.*$|String1='Test Three'|" file
or
sed -i "/^\s*String1=/s/=.*/='Test Three'/" file
Using awk you could do:
awk '/String1/ && f++ {$2="Test Three"}1' FS=\' OFS=\' file
#String1='Test One'
String1='Test Three'
It will ignore first hits of string1 since f is not true.

Extract CentOS mirror domain names using sed

I'm trying to extract a list of CentOS domain names only from http://mirrorlist.centos.org/?release=6.4&arch=x86_64&repo=os
Truncating prefix "http://" and "ftp://" to the first "/" character only resulting a list of
yum.phx.singlehop.com
mirror.nyi.net
bay.uchicago.edu
centos.mirror.constant.com
mirror.teklinks.com
centos.mirror.netriplex.com
centos.someimage.com
mirror.sanctuaryhost.com
mirrors.cat.pdx.edu
mirrors.tummy.com
I searched stackoverflow for the sed method but I'm still having trouble.
I tried doing this with sed
curl "http://mirrorlist.centos.org/?release=6.4&arch=x86_64&repo=os" | sed '/:\/\//,/\//p'
but doesn't look like it is doing anything. Can you give me some advice?
Here you go:
curl "http://mirrorlist.centos.org/?release=6.4&arch=x86_64&repo=os" | sed -e 's?.*://??' -e 's?/.*??'
Your sed was completely wrong:
/x/,/y/ is a range. It selects multiple lines, from a line matching /x/ until a line matching /y/
The p command prints the selected range
Since all lines match both the start and end pattern you used, you effectively selected all lines. And, since sed echoes the input by default, the p command results in duplicated lines (all lines printed twice).
In my fix:
I used s??? instead of s/// because this way I didn't need to escape all the / in the patterns, so it's a bit more readable this way
I used two expressions with the -e flag:
s?.*://?? matches everything up until :// and replaces it with nothing
s?/.*?? matches everything from / until the end replaces it with nothing
The two expressions are executed in the given order
In modern versions of sed you can omit -e and separate the two expressions with ;. I stick to using -e because it's more portable.

What is regular expression for first field containing alpha-numeric?

I have data that starts out like this in a .csv file
"684MF7","684MF7","RN"
The first field "684MF7" should only contain numeric characters; no alpha characters should be present in the first field. I have other checks for the second field, which in this case is also "684MF7", which is a legitimate value for the second field.
I want to find any alpha in the first field, and print that line. I invoke this sed file
{
/^".*[^0-9]*.*",/p
}
with -n and -f (for the file name).
What regular expression isolates the first field only? I am getting a match on everything, which isn't what I want. Is my problem because I am trying to match zero or more instead of 1 or more alpha characters?
The first field (any content) would be selected by:
/^"[^"]*"/
You want at least one of the characters in the field to be alpha (though it might be better regarded as 'non-digit'), in which case one of these should select what you're after:
/^"[^"]*[A-Za-z][^"]*"/
/^"[^"]*[^0-9"][^"]*"/
/^"[^"]*[[:alpha:]][^"]*"/
/^"[^"]*[^"[:digit:]][^"]*"/
Note that the negated classes must not match a double quote either (one reason for always testing answers — the first version of the script below listed both lines of input).
And converting one of those into a sed command:
sed -n '/^"[^"]*[^"[:digit:]][^"]*"/p' <<EOF
"684MF7","684MF7","RN"
"684007","684MF7","RN"
EOF
Another way of looking at the problem is "print any line where the first field is not all digits (with at least one digit present)". That is:
sed -n '/^"[[:digit:]]\{1,\}"/!p' <<EOF
"684MF7","684MF7","RN"
"684007","684MF7","RN"
EOF
On the whole, this is perhaps the better solution to use (and I shan't complain if you use [0-9] in place of [[:digit:]]).
Generally .* surrounding any other expressions tends to match more than expected. Try to write an expression that is more detailed with less large wildcard matches
I found this to work
> sed -n '/^".*[A-Z].*",".*",".*"/p' <(echo '"684MF7","684MF7","RN"')
> "684MF7","684MF7","RN"
> sed -n '/^".*[A-Z].*",".*",".*"/p' <(echo '"684117","684MF7","RN"')
>
It picks up all of the groups surrounded with "
Perhaps the following will work for you?
echo '"A84MF7","684MF7","3N"' | sed -n '/^"[^0-9,][^",]*"/p'
"A84MF7","684MF7","3N"
echo '"684MF7","684MF7","7N"' | sed -n '/^"[^0-9,][^",]*"/p'
--Nothing--

How to use sed to return something from first line which matches and quit early?

I saw from
How to use sed to replace only the first occurrence in a file?
How to do most of what I want:
sed -n '0,/.*\(something\).*/s//\1/p'
This finds the first match of something and extracts it (of course my real example is more complicated). I know sed has a 'quit' command which will make it stop early, but I don't know how to combine the 'q' with the above line to get my desired behavior.
I've tried replacing the 'p' with {p;q;} as I've seen in some examples, but that is clearly not working.
The "print and quit" trick works, if you also put the substitution command into the block (instead of only putting the print and quit there).
Try:
sed -n '/.*\(something\).*/{s//\1/p;q}'
Read this like: Match for the pattern given between the slashes. If this pattern is found, execute the actions specified in the block: Replace, print and exit. Typically, match- and substitution-patterns are equal, but this is not required, so the 's' command could also contain a different RE.
Actually, this is quite similar to what ghostdog74 answered for gawk.
My specific use case was to find out via 'git log' on a git-p4 imported tree which perforce branch was used for the last commit. git log (when called without -n will log every commit that every happened (hundreds of thousands for me)).
We can't know a-priori what value to give git for '-n.' After posting this, I found my solution which was:
git log | sed -n '0,/.*\[git-p4:.*\/\/depot\/blah\/\([^\/]*\)\/.*/s//\1/p; /\[git-p4/ q'
I'd still like to know how to do this with a non '/' separator, and without having to specify the 'git-p4' part twice, once for the extraction and once for the quit. There has got to be a way to combine both on the same line...
Sed usually has an option to specify more than one pattern to execute (IIRC, it's the -e option). That way, you can specify a second pattern that quits after the first line.
Another approach is to use sed to extract the first line (sed '1q'), then pipe that to a second sed command (what you show above).
use gawk
gawk '/MATCH/{
print "do something with "$0
exit
}' file
To "return something from first line and quit" you could also use grep:
grep --max-count 1 'something'
grep is supposed to be faster than sed and awk [1]. Even if not, this syntax is easier to remember (and to type: grep -m1 'something').
If you don't want to see the whole line but just the matching part, the --only-matching (-o) option might suffice.