Deleting lines of a file with sed - unexpected behaviour - sed

I noticed something a bit odd while fooling around with sed. If you try to remove multiple line intervals (by number) from a file, but any interval specified later in the list is fully contained within an interval earlier in the list, then an additional single line is removed after the specified (larger) interval.
seq 10 > foo.txt
sed '2,7d;3,6d' foo.txt
1
9
10
This behaviour was behind an annoying bug for me, since in my script I generated the interval endpoints on the fly, and in some cases the intervals produced were redundant. I can clean this up, but I can't think of a good reason why sed would behave this way on purpose.

Since this question was highlighted as needing an answer in the Stack Overflow Weekly Newsletter email for 2015-02-24, I'm converting the comments above (which provide the answer) into a formal answer. Unattributed comments here were made by me in essentially equivalent form.
Thank you for a concise, complete question. The result is interesting. I can reproduce it with your script. Intriguingly, sed '3,6d;2,7d' foo.txt (with the delete operations in the reverse order) produces the expected answer with 8 included in the output. That makes it look like it might be a reportable bug in (GNU) sed, especially as BSD sed (on Mac OS X 10.10.2 Yosemite) works correctly with the operations in either order. I tested using 'sed (GNU sed) 4.2.2' from an Ubuntu 14.04 derivative.
More data points for you/them. Both of these include 8 in the output:
sed -e '/2/,/7/d' -e '/3/,/6/d' foo.txt
sed -e '2,7d' -e '/3/,/6/d' foo.txt
By contrast, this does not:
sed -e '/2/,/7/d' -e '3,6d' foo.txt
The latter surprised me (even accepting the basic bug).
Beats me. I thought given some of sed's arcane constructs that you might be missing the batman symbol or something from the middle of your command but sed -e '2,7d' -e '3,6d' foo.txt behaves the same way and swapping the order produces the expected results (GNU sed 4.2.2 on Cygwin). /bin/sed on Solaris always produces the expected result and interestingly so does GNU sed 3.02. Ed Morton
More data: it only seems to happen with sed 4.2.2 if the 2nd range is a subset of the first: sed '2,5d;2,5d' shows the bug, sed '2,5d;1,5d' and sed '2,5d;2,6d' do not. glenn jackman
The GNU sed home page says "Please send bug reports to bug-sed at gnu.org" (except it has an # in place of ' at '). You've got a good reproduction; be explicit about the output you expect vs the output you get (they'll get the point, but it's best to make sure they can't misunderstand). Point out that the reverse ordering of the commands works as expected, and give the various other commands as examples of working or not working. (You could even give this Q&A URL as a cross-reference, but make sure that the bug report is self-contained so that it can be understood even if no-one follows the URL.)
You can also point to BSD sed (and the Solaris version, and the older GNU 3.02 sed) as behaving as expected. With the old version GNU sed working, it means this is arguably a regression. […After a little experimentation…] The breakage occurred in the 4.1 release; the 4.0.9 release is OK. (I also checked 4.1.5 and 4.2.1; both are broken.) That will help the maintainers if they want to find the trouble by looking at what changed.
The OP noted:
Thanks everyone for comments and additional tests. I'll submit a bug report to GNU sed and post their response. santayana

Related

Can someone please assist me expand sed to extract required data?

I must admit to sed seeming at times to be a bit of a black art to me; I found the following statement, which provided some of what I require. I am assuming that sed is my handiest option as it will be in a bash script.
I have a file with lots of stuff, e.g.
LOTS_OF_OTHER_STUFF/STRING1\nOL8.0:2019-10-08-2/STRING2/LOTS_OF_OTHER_STUFF_HERE/STRING1\nOL8-slim:2019-10-08-20/SRING2/LOTS_OF_OTHER_STUFF
sed '/STRING1/!d;s//&\n/;s/.*\n//;:a;/STRING2/bb;$!{n;ba};:b;s//\n&/;P;D'
\nOL8.0:2019-10-08-2/
\nOL8-slim:2019-10-08-20/
What I require is:
8.0 2
8-slim 20
Can anyone help?

Meaning of ",$d" in replacement part of sed command

I came across this command in a project I am working on:
sed -i '/regex/,$d' file
I don't understand how the ,$d part works. If I omit any part of ,$d I get errors. In my tests it looks like it replaces the matching line and anything after it with nothing. Example:
File with contents:
first line
second line regex
third line
fourth line
Comes out as after running that command:
first line
I couldn't find any documentation in the man page that explains this, though I could have easily missed it. The man page is hard for me to parse...
This is example was tested with GNU Sed v 4.2.2.
This is not a replacement command; the sed substitute or replace command looks like s/from/to/.
The general form of a sed script is a sequence of commands - typically a single letter, but some of them take arguments, like the s command above - with an optional address expression before each. You are looking at a d (delete line) command preceded by the address expression /regex/,$
The address range specifies lines from the first regex match through to the end of the file ($ in this context specifies the last line) and the action d deletes the specified lines.
Although many people only ever encounter simple sed scripts which use just the s command, this behavior will be described in any basic introduction to sed, as well as in the man page.

Perl: console / command-line tool for interactive code evaluation and testing

Python offers an interactive interpreter allowing the evaluation of little code snippets by submitting a couple of lines of code to the console. I was wondering if a tool with similar functionality (e.g. including a history accessible with the arrow keys) also exists for Perl?
There seem to be all kinds of solutions out there, but I can't seem to find any good recommendations. I.e. lots of tools are mentioned, but I'm interested in which tools people actually use and why. So, do you have any good recommendations, excluding the standard perl debugging (perl -d -e 1)?
Here are some interesting pages I've had a look at:
a question in the official Perl FAQ
another Stackoverflow question, where the answer mostly is the perl debugger and several links are broken
Perl Console
Perl Shell
perl -d -e 1
Is perfectly suitable, I've been using it for years and years. But if you just can't,
then you can check out Devel::REPL
If your problem with perl -d -e 1 is that it lacks command line history, then you should install Term::ReadLine::Perl which the debugger will use when installed.
Even though this question has plenty of answers, I'll add my two cents on the topic. My approach to the problem is easy if you are a ViM user, but I guess it can be done from other editors as well:
Open your ViM, and type your code. You don't need to save it on any file.
:w !perl for evaluation (:w !COMMAND pipes the buffer to the process obtained by running COMMAND. In this case the mighty perl interpreter!)
Take a look at the output
This approach is good for any interpreted language, not just for Perl.
In the case of Perl it is extremely convenient when you are writing your own modules, since in my experience the perl interpreter will refuse to reload a module (even when loading was attempted and failed). On the minus side, you will loose all your context every time, so if you are doing some heavy or slow operation, you need to save some intermediate results (whilst the perl console approach preserves the previously computed data).
If you just need the evaluation of an expression - which is the other use case for a perl console program - another good alternative is seeing the evaluation out of a perl -e command. It's fast to launch, but you have to deal with escaping (for this thing the $'...' syntax of Bash does the job pretty well.
Just use to get history and arrows:
rlwrap perl -de1

How to use sed to delete next line on Solaris

I trying to use sed in finding a matching pattern in a file then deleting
the next line only.
Ex.
LocationNew York <---- delete USA
LocationLondon <---- deleteUK
I tried sed '/Location/{n; d}' that work on linux but didn't work on solaris.
Thanks.
As I mentioned in my answer about your other sed question, Solaris sed is old-school AND needs more hand-holding (or to put it another way), is more fussy about it's syntax.
All you need is an additional ';' char placed after the `d' char, i.e.
sed '/Location/{n; d;}'
More generally, anything that can be on a new-line inside {...} needs a semi-colon separator when it is rolled up onto a single line. However, you can't roll up the 'a', 'i', 'c' commands onto a single line as you can in Linux.
In Solaris standard sed, the 'a', i', 'c' commands need a trailing '\' with NO spaces or tabs after it, as much data as you like (probably within some K limit) on \n terminated lines (NO \r s), followed by a blank line.
Newer installations of Solaris may also have /usr/xpg4/bin/sed installed. Try
/usr/xpg4/bin/sed '/Location/{n; d}'
If you're lucky, it will support your shortcut syntax. I don't have access to any solaris machines anymore to test this.
Finally, if that doesn't work, there are packages of GNU tools that can be installed that would have a sed that is much more like what you're used to from Linux. Ask your sys-admins if GNU tools are already there, or if they can be installed. I'm not sure what version of gnu sed started to support 'relaxed' syntax, so don't assume that it will be fixed without testing :-)
I hope this helps.
P.S.
Welcome to StackOverflow and let me remind you of three things we usually do here: 1) As you receive help, try to give it too, answering questions in your area of expertise 2) Read the FAQs, http://tinyurl.com/2vycnvr , 3) When you see good Q&A, vote them up by using the gray triangles, http://i.imgur.com/kygEP.png , as the credibility of the system is based on the reputation that users gain by sharing their knowledge. Also remember to accept the answer that better solves your problem, if any, by pressing the checkmark sign , http://i.imgur.com/uqJeW.png
You can append the next line to the current one and then remove everything that is not Location:
$ cat text
Location
New York <---- delete
USA
Location
London <---- delete
UK
$ sed '/Location/{N;s/Location.*$/Location/;}' text
Location
USA
Location
UK
I do not have a Solaris here so I would like to know if this works.
Does this AWK Solution works for you -
[jaypal~/temp]$ cat a.txt
Location
New York <---- delete
USA
Location
London <---- delete
UK
Updated to preserve the empty line -
!NF is used to preserve the blank lines. It means, if the Number of Fields is = 0 then just print the line. NF is an in-built variable which keeps track of number of fields in a record. If we encounter a blank line, we skip the rest of the processing and go to the next line.
!/Location/ will print the lines. This is to preserve the lines which are not followed by Location. Printing is an implicit action in AWK whenever the pattern is true.
The third patter/action is where we print the line when it matches the RegEx /Location/. Apart from printing the line, we do getline twice which effectively deletes your next line and then print it.
[jaypal~/temp]$ awk '!NF{print;next}; !/Location/; /Location/{print;getline;getline;print}' INPUT_FILE
Location
USA
Location
UK

What can awk do that sed can't?

I used sed for a batch ptovess where I could not do it with awk. Vould awk have done it? Or is it more a matter of choice and call awk and sed equivalent for the usage. They both do the common search replace similar with i/o. Is there a good example what can't be done with one that the other can?
One main difference is that an awk program can maintain state and can operate using multiple passes over the same data. A sed invocation is necessarily stateless single-pass because sed (Stream EDitor) is inherently stream-oriented. The advantage, though, is that this makes sed simpler and more appropriate for using in pipe chains.
In the original (and still the best)
book, The AWK Programming Language, the following are implemented (among many other things):
a simple assembler
recursive descent compiler
a text indexing program
Try doing that with sed.
G'day,
Awk is more powerful. sed tends to be more limited in what it can do.
Sed is good for line-based changes to data. It has some simple looping constructs, the usual ed/ex/vi regexp stuff and substitution things, compound statements, decisions etc. Most people use it for modifying piped data.
Awk is good for filtering or rearranging data. It mostly gets used for reporting.
I'd suggest having a look at Dale Dougherty's excellent book "sed & awk" (sanitised Amazon link). BTW It's got one of the best explanations of regexps in there as well.
Many people would say use Perl anyway! (-:
Edit: Forgot to say that the awk language is quite C like which is no surprise given that the 'k' in awk stands for Brian Kernighan. Yes. That Brian Kernighan!
Also, sed only works on data streams whereas awk works on both data streams and files.
HTH
cheers,Sed only works on data streams whereas awk works on both data streams and files.
The only thing I can think of is that sed can do little changes in less char count the awk. So for quick tweaks on a live shell, it's faster to type.
SED is a stream editor, and therefore does not have variables and a few other constructs like AWK has. AWK is a fully fledged language.