the 'd' command in the sed utility - sed

From the sed documentation:
d Delete the pattern space; immediately start next cycle.
What does it mean by next cycle? My understanding is that sed will not apply the following commands after the d command and it starts to read the next line from the input stream and processes it. But it seems that this is not true. See this example:
[root#localhost ~]# cat -A test.txt
aaaaaaaaaaaaaa$
$
bbbbbbbbbbbbb$
$
$
ccccccccc$
ddd$
$
eeeeeee$
[root#localhost ~]# cat test.txt | sed '/^$/d;p;p'
aaaaaaaaaaaaaa
aaaaaaaaaaaaaa
aaaaaaaaaaaaaa
bbbbbbbbbbbbb
bbbbbbbbbbbbb
bbbbbbbbbbbbb
ccccccccc
ccccccccc
ccccccccc
ddd
ddd
ddd
eeeeeee
eeeeeee
eeeeeee
[root#localhost ~]#
If immediately start next cycle, the p command will not have any output.
Anyone can help me to explain it please? Thanks.

It means that sed will read the next line and start processing it.
Your test script doesn't do what you think. It matches the empty lines and applies the delete command to them. They don't appear, so the print statements don't get applied to the empty lines. The two print commands aren't connected to the pattern for the delete command, so the non-empty lines are printed three times. If you instead try
sed '/./d;p;p' test.txt # matches all non-empty lines
nothing will be printed other than the blank lines, three times each.

a) You can combine multiple commands for one pattern with curly braces:
sed '/^$/{d;p;p}' test.txt
aaaaaaaaaaaaaa
bbbbbbbbbbbbb
ccccccccc
ddd
eeeeeee
The command d is only applied to empty lines here: '/^$/d;p;p'. Else the line is printed 2 additional times. To bind the 'p'-command to the pattern, you have to build curly braces. Then the p-command is as well skipped, but because of the skipping to next cycle, not because it doesn't match.
b) Useless use of cat. (already shown)

Related

Combining sed commands in bash

I am aiming to try and combine the two following sed commands to print out one output. The first command is used to strip the HTML file of its HTML tags and the second is to specify I only want lines 11 through to 16 of the file.
sed -e 's/<[^>]*.//g' file.html
sed -n '11,16p' file.html
I have been playing around with this for a while now and can only ever seem to get the output of lines 11-16 with the HTML tags, or all lines without the HTML, when I am aiming to display the output of lines 11-16 without any HTML tags. Any help would be greatly appreciated, thanks!
The naive way would be to use a pipe:
sed 's/<[^>]*.//g' file.htm | sed -n '11,16p'
You may also combine the address and the pattern:
sed -n '11,16 s/<[^>]*.//pg' file.html
Here,
-n will suppress the default line output
11,16 - will set the address, Lines 11 through 16
s/<[^>]*.// - will look for <, then zero or more chars other than > and then any one char (did you mean a >?)
p - print the result of the substitution
g - all occurrences on the line
An online demo (shortened version, Lines 2-4):
#!/bin/bash
s="<111111>aaa<111111>
<22222>bbb<111111>
<33333>ccc<111111>
<44444>ddd<111111>
<55555>eee<111111>"
sed -n '2,4 s/<[^>]*.//pg' <<< "$s"
Output:
bbb
ccc
ddd
If GNU-compatible,
sed -n '11,16{ s/<[^>]*.//g; p; }; 17q;' file.html
The range will take a block, allowing both commands to be done sequentially to each line.
The 17q; just keeps it from wasting time on lines you already know you don't need.

How to find only the first and last line of a file using sed

I have a file called error_log for the apache and I want to see the first line and the last line of this file using sed command. Would you please help me how can I do that?
I know how to do that with head and tail commands, but I'm curious if it's possible in sed command too.
I have read the man sed and have googled a lot but nothing is found unfortunately.
This might work for you (GNU sed):
sed '1b;$b;d' file
All sed commands can be prefixed by either an address or a regexp. An address is either a line number or the $ which represents the last line. If neither an address or a regexp is present, the following command applies to all other lines.
The normal sed cycle, presents each line of input (less its newline) in the pattern space. The sed commands are then applied and the final act of the cycle is to re-attach the newline and print the result.
The b command controls command flow; if by itself it jumps out of the following sed commands to the final act of the cycle i.e. where the newline is re-attached and the result printed.
The d command deletes the pattern space and since there is nothing to be printed no further processing is executed (including re-attaching the newline and printing the result).
Thus the solution above prints the first line and the last and deletes the rest.
Sed has some command line options, one of which turns of the implicit printing of the result of the pattern space -n. The p command prints the current state of the pattern space. Thus the dual of the above solution is:
sed -n '1p;$p' file
N.B. If the input file is only one line the first solution will only print one line whereas the second solution will print the same line twice. Also if more than one file is input both solutions will print the first line of the first file and last line of the last file unless the -i option is in place, in which case each file will be amended. The -s option replicates this without amending each file but streams the results to stdout as if each file is treated separately.
This will work:
sed -n '1p ; $p' error_log
1p will print the first line and $p will print the last line.
As a suggestion, take a look at info sed, not only man sed. You can find the some examples about your question at the paragraph 2.1.
First line:
sed '2,$d' error_log
Last line:
sed '$!d' error_log
Based on your new requirement to output nothing if the input file is just 1 line (see How to find only the first and last line of a file using sed):
awk 'NR==1{first=$0} {last=$0} END{if (NR>1) print first ORS last}'
Original answer:
This is one of those things that you can, at face value, do easily enough in sed:
$ seq 3 7
3
4
5
6
7
$ seq 3 7 | sed -n '1p; $p'
3
7
but then how to handle edge cases like one line of input is non-obvious, e.g. is this REALLY the correct output:
$ printf 'foo\n' | sed -n '1p; $p'
foo
foo
or is the correct output just:
foo
and if the latter, how do you tweak that sed command to produce that output? #potong suggested a GNU sed command:
$ printf 'foo\n' | sed '1b;$b;d'
foo
which works but may be GNU-only (idk) and more importantly doesn't look much like the command we started with so the tiniest change in requirements meant a complete rewrite using different constructs.
Now, how about if you want to enhance it to, say, only print the first and last line if the file contained foo? I expect that'd be another challenging exercise with sed and probably involve non-portable constructs too.
It's just all pointless to learn how to do this with sed when you can use a different tool like awk and do whatever you like in a simple, consistent, portable syntax:
$ seq 3 7 |
awk 'NR==1{first=$0} {last=$0} END{print first ORS last}'
3
7
$ printf 'foo\n' |
awk 'NR==1{first=$0} {last=$0} END{print first ORS last}'
foo
foo
$ printf 'foo\n' |
awk 'NR==1{first=$0} {last=$0} END{print first (NR>1 ? ORS last : "")}'
foo
$ printf '3\nfoo\n7\n' |
awk 'NR==1{first=$0} /foo/{f=1} {last=$0} END{if (f) print first (NR>1 ? ORS last : "")}'
3
7
$ printf '3\nbar\n7\n' |
awk 'NR==1{first=$0} /foo/{f=1} {last=$0} END{if (f) print first (NR>1 ? ORS last : "")}'
$
Notice that:
Every command looks like every other command.
A minor change in requirements leads to a minor change in the code, not a complete rewrite.
Once you learn how to do any given thing A, how to do similar things B, C, D, etc. just builds on top of the syntax you already used, you don't have to learn a completely different syntax.
Each of those commands will work using any awk in any shell on every UNIX box.
Now, how about if you want to do that for multiple files such as would be created by the following commands?
$ seq 3 7 > file1
$ seq 12 25 > file2
With awk you can just store the lines in an array for printing in the END:
$ awk 'FNR==1{first[++cnt]=$0} {last[cnt]=$0}
END{for (i=1;i<=cnt;i++) print first[i] ORS last[i]}' file1 file2
3
7
12
25
or with GNU awk you can print them from ENDFILE:
$ awk 'FNR==1{first=$0} {last=$0} ENDFILE{print first ORS last}' file1 file2
3
7
12
25
With sed? An exercise left for the reader.

Sed inside a while read loop

I have been reading a lot of questions and answers about using sed within a while loop. I think I have the command down correctly, but I seem to get no output once I put all of the pieces together. Can someone tell me what I am missing?
I have an input file with 700 variables, one on each line. I need to use each of these 700 variables within a sed command. I run the following command to verify variables are outputting correctly:
cat Input_File.txt | while read var; do echo $var; done
I then try to add in the sed command as follows:
cat Input_File.txt | while read var; do sed -n "/$var/,+10p" Multi-BLAST_5814.txt >> Multi_BLAST_Subset; done
This command leaves me without an error, but a blinking cursor as if this is an infinite loop. It should use each of the 700 variables, find the corresponding line in Multi_BLAST_5814.txt and output the search variable line and the 10 lines after the search term into a new file, appending each as it goes. I can execute the sed command alone with a manually set single value variable successfully and I can execute the while loop successfully using the input file. Anyone have a thought as to why this is not working?
User, that is exactly what I have done to this point.
I have a large text file (128 MB) with BLAST output. I need to search through this for a subset of results for 769 samples (Out of the 5814 samples that are in the file).
I have created a .txt file with those 769 sample names.
To test grep and sed, I manually assigned a variable with one of the 769 samples names I need to search and can get the results I need as follows:
$ Otu="S41_Folmer_Otu96;size=12;"
$ grep $Otu -A 10 Multi_BLAST_5814.txt
OR
$ sed -n "/$Otu/,+10p" Multi_BLAST_5814.txt
The OUTPUT is exactly what I want as follows:
Query= S41_Folmer_Otu96;size=12;
Length=101
Sequences producing significant alignments: Score(Bits) E Value
gi|58397553|gb|AY830431.1| Scopelocheirus schellenbergi clone... 180 1E-41
gi|306447543|gb|HQ018876.1| Liposcelis paeta isolate CZ cytoc... 174 6E-40
gi|306447533|gb|HQ018871.1| Liposcelis decolor isolate CQ cyt... 104 9E-19
gi|1043259532|gb|KX130860.1| Batocera rufomaculata isolate Br... 99 4E-17
gi|987210821|gb|KR141076.1| Psocoptera sp. BOLD:ACO1391 vouch... 81 1E-11
To Test to make sure the input file contains the correct variables I run the following:
$ Cat Input_File.txt
$ while read Otu; do echo $Otu; done <Input_File.txt
S41_Folmer_Otu96;size=12;
S78_Folmer_Otu15;size=538;
S73_Leray_Otu52;size=6;
S66_Leray_Otu93;size=6;
S10_Folmer_Otu10;size=1612;
... All 769 variables
Again, this is exactly what I expect and is correct.
But, When I do either of the following commands, nothing is printed to the screen (if I leave off the write file/append action) or to the file I need to create.
$ cat Input_File.txt | while read Otu; do grep "$Otu" -A 10 Multi_BLAST_5814.txt >> Multi_BLAST_Subset.txt; done
$ cat Input_File.txt | while read Otu; do sed -n "/$Otu/,+10p" Multi_BLAST_5814.txt >> Multi_BLAST_Subset.txt; done
Sed hangs and never closes, leaving me at a blinking cursor. Grep finishes but also gives no output. I am at a loss as to why this is not working. Everything works inidividually, so I may be left with manually searching all 769 samples copy/paste.
If you have access to GNU grep no need for a sed command, grep "$var" -A 10 will do the same thing and won't break if $var contains the delimiter used in your sed command.
From man grep :
-A NUM, --after-context=NUM
Print NUM lines of trailing context after matching lines.
Places a line containing a group separator (--) between
contiguous groups of matches. With the -o or --only-matching
option, this has no effect and a warning is given.
Not sure whether you have already attempted it but try breaking the problem into smaller chunks. Simple example below :
$ cat Input_File.txt
one
two
three
$
$ cat file.txt
This is line one
This is line two
This is line three
This is another four
This is another five
This is another six
This is another seven
$
$ cat Input_File.txt | while read var ; do echo $var ; sed -n "/$var/,+1p" file.txt ; done
one
This is line one
This is line two
two
This is line two
This is line three
three
This is line three
This is another four
$

Multiple lines from one with sed

Is there some way in sed to create multiple output lines from a single input line? I have a template file (there are more lines in the file, I'm just simplifying it):
http://hostname:#PORT#
I am currently using sed to replace #PORT# with a real port. However, I'd like to be able to pass in multiple ports, and have sed create a line for each. Is that possible?
I'm assuming you would want to duplicate the whole line for each port number. In that case it's easier to think of it as replacing the port numbers with the URL:
$ cat ports.in
1
2
3
4
5
$ sed 's#^\([0-9]*\$)#http://hostname:\1#' ports.in
http://hostname:1
http://hostname:2
http://hostname:3
http://hostname:4
http://hostname:5
To do it the other way around is easier with awk:
$ cat url.in
http://hostname:#PORT#
$ awk '/^[0-9]/ {ports[++i]=$0} /^http/ {sub(":#PORT#", ":%d\n"); for (p in ports) printf($0, ports[p])}' ports.in url.in
http://hostname:2
http://hostname:3
http://hostname:4
http://hostname:5
http://hostname:1
This reads both ports.in and url.in, and if a line starts with a number it is assumed that it's a port number from ports.in. Otherwise, if the line starts with http it's assumed to be an URL from url.in and will replace the port placeholder with a printf formatting string and then print the URL once for each port number read. It will fail to do the right thing if the files are fed in the wrong order.
A similar solution, but taking the URL from a shell variable:
$ myurl="http://hostname:#PORT#"
$ awk -v url="$myurl" 'BEGIN{sub(":#PORT#", ":%d\n",url)} /^[0-9]/ {ports[++i]=$0} END {for (p in ports) printf(url, ports[p])}' ports.in
http://hostname:2
http://hostname:3
http://hostname:4
http://hostname:5
http://hostname:1
It seems you have multiple templates and multiple ports to apply to them. Here's how to do it in a shell script (tested with bash), but you'll need to do it in two sed executions if you want to keep it simple because you have two multiply valued inputs. It is mathematically a cross product of the templates and the substitution values.
ports='80
8080
8081'
templates='http://domain1.net:%PORT/
http://domain2.org:%PORT/
http://domain3.com:%PORT/'
meta="s/(.*)/g; s|%PORT|\1|p; /p"
sed="`echo \"$ports\" |sed -rn \"$meta\" |tr '\n' ' '`"
echo "$templates" |sed -rn "h; $sed"
The shell variable meta is a meta sed script because it writes another sed script. The h saves the pattern buffer in the sed hold space. The sed commands generated from the meta sed recall, substitute, and print for each port. This is the result.
http://domain1.net:80/
http://domain1.net:8080/
http://domain1.net:8081/
http://domain2.org:80/
http://domain2.org:8080/
http://domain2.org:8081/
http://domain3.com:80/
http://domain3.com:8080/
http://domain3.com:8081/

Have sed ignore non-matching lines

How can I make sed filter matching lines according to some expression, but ignore non-matching lines, instead of letting them print?
As a real example, I want to run scalac (the Scala compiler) on a set of files, and read from its -verbose output the .class files created. scalac -verbose outputs a bunch of messages, but we're only interested in those of the form [wrote some-class-name.class].
What I'm currently doing is this (|& is bash 4.0's way to pipe stderr to the next program):
$ scalac -verbose some-file.scala ... |& sed 's/^\[wrote \(.*\.class\)\]$/\1/'
This will extract the file names from the messages we're interested in, but will also let all other messages pass through unchanged! Of course we could do instead this:
$ scalac -verbose some-file.scala ... |& grep '^\[wrote .*\.class\]$' |
sed 's/^\[wrote \(.*\.class\)\]$/\1/'
which works but looks very much like going around the real problem, which is how to instruct sed to ignore non-matching lines from the input. So how do we do that?
If you don't want to print lines that don't match, you can use the combination of
-n option which tells sed not to print
p flag which tells sed to print what is matched
This gives:
sed -n 's/.../.../p'
Another way with plain sed:
sed -e 's/.../.../;t;d'
s/// is a substituion, t without any label conditionally skips all following commands, d deletes line.
No need for perl or grep.
(edited after Nicholas Riley's suggestion)
Rapsey raised a relevant point about multiple substitutions expressions.
First, quoting an Unix SE answer, you can "prefix most sed commands with an address to limit the lines to which they apply".
Second, you can group commands within curly braces {} (separated with a semi-colon ; or a new line)
Third, add the print flag p on the last substitution
Syntax:
sed -n -e '/^given_regexp/ {s/regexp1/replacement1/flags1;[...];s/regexp1/replacement1/flagsnp}'
Example (see Here document for more details):
Code:
sed -n -e '/^ha/ {s/h/k/g;s/a/e/gp}' <<SAMPLE
haha
hihi
SAMPLE
Result:
keke
sed -n '/.../!p'
There is no need for a substitution.