Manipulate ampersand in sed - sed

Is it possible to manipulate the ampersand in sed? I want to add +1 to all numbers in a file. Something like this:
sed -i "s/[0-9]\{1,2\}/$(expr & + 1)/g" filename
EDIT: Today I created a loop using grep and sed that does the job needed. But the question remains open if anyone knows of a way of manipulating the ampersand, since this is not the first time I wanted to run commands on the replacement string, and couldn't.

You may use e modifier to achieve this:
$ cat test.txt
1
2
$ sed 's/^[0-9]\{1,2\}$/expr & + 1/e' test.txt
2
3
In this case you should construct command in replacement part which will be executed and result will be used for replacement.

sed will need to thunk out to some shell command (with '!') on each line to do that.
Here you think you are calling sed which then calls back to the shell to evaluate $(expr & + 1) for each line, but actually it isn't. $(expr & + 1) will just get statically evaluated (once) by the outer shell, and cause an error, since '&' is not at that point a number.
To actually do this, either:
hardcode all ten cases of last digit 0..9, as per this example in sed documentation
Use a sed-command which starts with '1,$!' to invoke the shell on each line, and perform the increment there, with expr, awk, perl or whatever.
FOOTNOTE: I never knew about the /e modifier, which php-coder shows.

Great question. smci answered first and was spot on about shells.
In case you want to solve this problem in general, here is (for fun) a Ruby solution embedded in an example:
echo "hdf 4 fs 88\n5 22 sdf lsd 6" | ruby -e 'ARGF.each {|line| puts line.gsub(/(\d+)/) {|n| n.to_i+1}}'
The output should be
hdf 5 fs 89\n6 23 sdf lsd 7

Related

Print strings alongside math results in s// substitution with the /e modifier

I am trying to write a very simple one liner to find cases of:
foo N
and replace them with
foo N-Y
For example, if I had 3 files and they had the following lines in them:
foo 5
foo 3
foo 9
After the script is run with Y=4, the lines would read:
foo 1
foo -1
foo 5
I stumbled upon an existing thread that suggested using /e to run code in the replace half of the substitute command and was able to effectively subtract Y from all my matches, but I have no idea how to best print "foo" back into the file since when I try to separate foo and the number into two capture groups and print them back in, perl thinks I am trying to multiply them and wants an operator.
Here's where I'm at:
find . -iname "*somematch*" -exec perl -pi -e 's/(Foo *)(\d+)/$1$2-4/e' {} \;
Of course this doesn't work, "Scalar found where operator expected at -e line 1, near "$1$2." I'm at a loss as to how best to proceed without writing something much longer.
Edit: To be more specific, if I have the /e option enabled to be able to perform math in the substitution, is there a simple way to print the string in another capture group in that substitution without it trying to do math to it?
Alternatively, is there a simple way to surgically perform the substitution on only part of the pattern? I tried to combine m// and s/// to achieve the results but ended up getting nowhere.
The replacement part is treated as code under /e so it need be written using legal syntax, like you'd use in a program. Writing $t$v isn't legal syntax ($1$2 in your regex).
One way to concatenate strings is $t . $v. Then you also need parenthesis around the addition, since by precedence rules the strings $1 and $2 are concatenated first, and that alphanumeric string attempted in addition, drawing a warning. So
perl -i -pe's/(Foo *)([0-9]+)/$1.($2-4)/e'
I replaced \d with [0-9] since \d matches all kinds of "digits," from all over Unicode, what doesn't seem to be what you need.
There is another way if the math comes after the rest of the pattern, as it does in your examples
perl -i -pe's/Foo *\K([0-9]+)/$1-4/e'
Here the \K is a form of positive lookbehind which drops all matches previous to that anchor, so they are not consumed. Thus only the [0-9]+ is replaced, as needed.

Use sed to take all lines containing regex and append to end of file

I'm trying to come up with a sed script to take all lines containing a pattern and move them to the end of the output. This is an exercise in learning hold vs pattern space and I'm struggling to come up with it (though I feel close).
I'm here:
$ echo -e "hi\nfoo1\nbar\nsomething\nfoo2\nyo" | sed -E '/foo/H; //d; $G'
hi
bar
something
yo
foo1
foo2
But I want the output to be:
hi
bar
something
yo
foo1
foo2
I understand why this is happening. It is because the first time we find foo the hold space is empty so the H appends \n to the blank hold space and then the first foo, which I suppose is fine. But then the $G does it again, namely another append which appends \n plus what is in the hold space to the pattern space.
I tried a final delete command with /^$/d but that didn't remove the blank line (I think this is because this pattern is being matched not against the last line, but against the, now, multiline pattern space which has a \n\n in it.
I'm sure the sed gurus have a fix for me.
This might work for you (GNU sed):
sed '/foo/H;//!p;$!d;x;//s/.//p;d' file
If the line contains the required string append it to the hold space (HS) otherwise print it as normal. If it is not the last line delete it otherwise swap the HS for the pattern space (PS). If the required string(s) is now in the PS (what was the HS); since all such patterns were appended, the first character will be a newline, delete the first character and print. Delete whatever is left.
An alternative, using the -n flag:
sed -n '/foo/H;//!p;$!b;x;//s/.//p' file
N.B. When the d or b (without a parameter) command is performed no further sed commands are, a new line is read into the PS and the sed script begins with the first command i.e. the sed commands do not resume following the previous d command.
Why? Stuff like this is absolutely trivial in awk, awk is available everywhere that sed is, and the resulting awk script will be simpler, more portable, faster and better in almost every other way than a sed script to do the same task. All that hold space stuff was necessary in sed before the mid-1970s when awk was invented but there's absolutely no use for it now other than as a mental exercise.
$ echo -e "hi\nfoo1\nbar\nsomething\nfoo2\nyo" |
awk '/foo/{buf = buf $0 RS;next} {print} END{printf "%s",buf}'
hi
bar
something
yo
foo1
foo2
The above will work as-is in every awk on every UNIX installation and I bet you can figure out how it works very easily.
This feels like a hack and I think it should be possible to handle this situation more gracefully. The following works on GNU sed:
echo -e "hi\nfoo1\nbar\nsomething\nfoo2\nyo" | sed -r '/foo/{H;d;}; $G; s/\n\n/\n/g'
However, on OSX/BSD sed, results in this odd output:
hi
bar
something
yonfoo1
foo2
Note the 2 consecutive newlines was replaced with the literal character n
The OSX/BSD vs GNU sed is explained in this article. And the following works (in GNU SED as well):
echo -e "hi\nfoo1\nbar\nsomething\nfoo2\nyo" | sed '/foo/{H;d;}; $G; s/\n\n/\'$'\n''/'
TL;DR; in BSD sed, it does not accept escaped characters in the RHS of the replacement expression and so you either have to put a true LF/newline in there at the command line, or do the above where you split the sed script string where you need the newline on the RHS and put a dollar sign in front of '\n' so the shell interprets it as a line feed.

The perl -pe command

So I've done a research about the perl -pe command and I know that it takes records from a file and creates an output out of it in a form of another file. Now I'm a bit confused as to how this line of command works since it's a little modified so I can't really figure out what exactly is the role of perl pe in it. Here's the command:
cd /usr/kplushome/entities/Standalone/config/webaccess/WebaccessServer/etc
(PATH=/usr/ucb:$PATH; ./checkall.sh;) | perl -pe "s,^, ,g;"
Any idea how it works here?
What's even more confusing in the above statement is this part : "s,^, ,g;"
Any help would be much appreciated. Let me know if you guys need more info. Thank you!
It simply takes an expression given by the -e flag (in this case, s,^, ,g) and performs it on every line of the input, printing the modified line (i.e. the result of the expression) to the output.
The expression itself is something called a regular expression (or "regexp" or "regex") and is a field of learning in and of itself. Quick googles for "regular expression tutorial" and "getting started with regular expressions" turn up tons of results, so that might be a good place to start.
This expression, s,^, ,g, adds ten spaces to the start of the line, and as I said earlier, perl -p applies it to every line.
"s,^, ,g;"
s is use for substitution. syntax is s/somestring/replacement/.
In your command , is the delimiter instead of /.
g is for work globally, means replace all occurrence.
For example:
perl -p -i -e "s/oldstring/newstring/g" file.txt;
In file.txt all oldstring will replace with newstring.
i is for inplace file editing.
See these doc for information:
perlre
perlretut
perlop

Extract CentOS mirror domain names using sed

I'm trying to extract a list of CentOS domain names only from http://mirrorlist.centos.org/?release=6.4&arch=x86_64&repo=os
Truncating prefix "http://" and "ftp://" to the first "/" character only resulting a list of
yum.phx.singlehop.com
mirror.nyi.net
bay.uchicago.edu
centos.mirror.constant.com
mirror.teklinks.com
centos.mirror.netriplex.com
centos.someimage.com
mirror.sanctuaryhost.com
mirrors.cat.pdx.edu
mirrors.tummy.com
I searched stackoverflow for the sed method but I'm still having trouble.
I tried doing this with sed
curl "http://mirrorlist.centos.org/?release=6.4&arch=x86_64&repo=os" | sed '/:\/\//,/\//p'
but doesn't look like it is doing anything. Can you give me some advice?
Here you go:
curl "http://mirrorlist.centos.org/?release=6.4&arch=x86_64&repo=os" | sed -e 's?.*://??' -e 's?/.*??'
Your sed was completely wrong:
/x/,/y/ is a range. It selects multiple lines, from a line matching /x/ until a line matching /y/
The p command prints the selected range
Since all lines match both the start and end pattern you used, you effectively selected all lines. And, since sed echoes the input by default, the p command results in duplicated lines (all lines printed twice).
In my fix:
I used s??? instead of s/// because this way I didn't need to escape all the / in the patterns, so it's a bit more readable this way
I used two expressions with the -e flag:
s?.*://?? matches everything up until :// and replaces it with nothing
s?/.*?? matches everything from / until the end replaces it with nothing
The two expressions are executed in the given order
In modern versions of sed you can omit -e and separate the two expressions with ;. I stick to using -e because it's more portable.

How can I remove all non-word characters except the newline?

I have a file like this:
my line - some words & text
oh lóok i've got some characters
I want to 'normalize' it and remove all the non-word characters. I want to end up with something like this:
mylinesomewordstext
ohlóokivegotsomecharacters
I'm using Linux on the command line at the moment, and I'm hoping there's some one-liner I can use.
I tried this:
cat file | perl -pe 's/\W//'
But that removed all the newlines and put everything one line. Is there someway I can tell Perl to not include newlines in the \W? Or is there some other way?
This removes characters that don't match \w or \n:
cat file | perl -C -pe 's/[^\w\n]//g'
#sth's solution uses Perl, which is (at least on my system) not Unicode compatible, thus it loses the accented o character.
On the other hand, sed is Unicode compatible (according to the lists on this page), and gives a correct result:
$ sed 's/\W//g' a.txt
mylinesomewordstext
ohlóokivegotsomecharacters
In Perl, I'd just add the -l switch, which re-adds the newline by appending it to the end of every print():
perl -ple 's/\W//g' file
Notice that you don't need the cat.
The previous response isn't echoing the "ó" character. At least in my case.
sed 's/\W//g' file
Best practices for shell scripting dictate that you should use the tr program for replacing single characters instead of sed, because it's faster and more efficient. Obviously use sed if replacing longer strings.
tr -d '[:blank:][:punct:]' < file
When run with time I get:
real 0m0.003s
user 0m0.000s
sys 0m0.004s
When I run the sed answer (sed -e 's/\W//g' file) with time I get:
real 0m0.003s
user 0m0.004s
sys 0m0.004s
While not a "huge" difference, you'll notice the difference when running against larger data sets. Also please notice how I didn't pipe cat's output into tr, instead using I/O redirection (one less process to spawn).