Combining sed commands in bash - sed

I am aiming to try and combine the two following sed commands to print out one output. The first command is used to strip the HTML file of its HTML tags and the second is to specify I only want lines 11 through to 16 of the file.
sed -e 's/<[^>]*.//g' file.html
sed -n '11,16p' file.html
I have been playing around with this for a while now and can only ever seem to get the output of lines 11-16 with the HTML tags, or all lines without the HTML, when I am aiming to display the output of lines 11-16 without any HTML tags. Any help would be greatly appreciated, thanks!

The naive way would be to use a pipe:
sed 's/<[^>]*.//g' file.htm | sed -n '11,16p'
You may also combine the address and the pattern:
sed -n '11,16 s/<[^>]*.//pg' file.html
Here,
-n will suppress the default line output
11,16 - will set the address, Lines 11 through 16
s/<[^>]*.// - will look for <, then zero or more chars other than > and then any one char (did you mean a >?)
p - print the result of the substitution
g - all occurrences on the line
An online demo (shortened version, Lines 2-4):
#!/bin/bash
s="<111111>aaa<111111>
<22222>bbb<111111>
<33333>ccc<111111>
<44444>ddd<111111>
<55555>eee<111111>"
sed -n '2,4 s/<[^>]*.//pg' <<< "$s"
Output:
bbb
ccc
ddd

If GNU-compatible,
sed -n '11,16{ s/<[^>]*.//g; p; }; 17q;' file.html
The range will take a block, allowing both commands to be done sequentially to each line.
The 17q; just keeps it from wasting time on lines you already know you don't need.

Related

How to get block printed out with sed

The problem
I have the following input file:
test.txt
This is a text2
This is a text3
This is a text4
This is a text5
START
This is a text1
END
This is a text6
This is a text7
This is a text8
This is a text9
I'd like to print the text between the START and END block with sed to
practice with it a little bit but I'm so confused how to achieve that.
What I've tried
I tried the following commands:
cat $(test.txt) | sed -n -e '{/START/=},{/END/=}p'
I hoped here that the {/START/=} and {/END/=} blocks return the line numbers
where the START and END block are.
cat $(test.txt) | sed -n -e '$(sed -n -e "/START/="),$(sed -n -e "/END/=")p'
I tried to get the line numbers of the START and END blocks by embedding
them into $(...).
I'm getting out of ideas.
What would you recommend to use for?
Anyways, I'm also wondering if using sed is the best tool for this task. Do you
have any suggestions for other tools which suit more for this task? I'd like to
see an example as well, thanks!
sed -ne '/START/,/END/p' test.txt
Is the typical solution. This will apply the p command to all lines between (and including) line that matches START until the line that matches END.
Excluding the start and end of the range is not very clean in sed. One approach is to explicitly match them:
sed -ne '/START/,/END/{/START/d; /END/d; p;}' test.txt
but for this particular use case it's probably cleaner to include those lines in the range of lines that are explicitly deleted:
sed -e '1,/START/d' -e '/END/,$d' test.txt

Multiple lines from one with sed

Is there some way in sed to create multiple output lines from a single input line? I have a template file (there are more lines in the file, I'm just simplifying it):
http://hostname:#PORT#
I am currently using sed to replace #PORT# with a real port. However, I'd like to be able to pass in multiple ports, and have sed create a line for each. Is that possible?
I'm assuming you would want to duplicate the whole line for each port number. In that case it's easier to think of it as replacing the port numbers with the URL:
$ cat ports.in
1
2
3
4
5
$ sed 's#^\([0-9]*\$)#http://hostname:\1#' ports.in
http://hostname:1
http://hostname:2
http://hostname:3
http://hostname:4
http://hostname:5
To do it the other way around is easier with awk:
$ cat url.in
http://hostname:#PORT#
$ awk '/^[0-9]/ {ports[++i]=$0} /^http/ {sub(":#PORT#", ":%d\n"); for (p in ports) printf($0, ports[p])}' ports.in url.in
http://hostname:2
http://hostname:3
http://hostname:4
http://hostname:5
http://hostname:1
This reads both ports.in and url.in, and if a line starts with a number it is assumed that it's a port number from ports.in. Otherwise, if the line starts with http it's assumed to be an URL from url.in and will replace the port placeholder with a printf formatting string and then print the URL once for each port number read. It will fail to do the right thing if the files are fed in the wrong order.
A similar solution, but taking the URL from a shell variable:
$ myurl="http://hostname:#PORT#"
$ awk -v url="$myurl" 'BEGIN{sub(":#PORT#", ":%d\n",url)} /^[0-9]/ {ports[++i]=$0} END {for (p in ports) printf(url, ports[p])}' ports.in
http://hostname:2
http://hostname:3
http://hostname:4
http://hostname:5
http://hostname:1
It seems you have multiple templates and multiple ports to apply to them. Here's how to do it in a shell script (tested with bash), but you'll need to do it in two sed executions if you want to keep it simple because you have two multiply valued inputs. It is mathematically a cross product of the templates and the substitution values.
ports='80
8080
8081'
templates='http://domain1.net:%PORT/
http://domain2.org:%PORT/
http://domain3.com:%PORT/'
meta="s/(.*)/g; s|%PORT|\1|p; /p"
sed="`echo \"$ports\" |sed -rn \"$meta\" |tr '\n' ' '`"
echo "$templates" |sed -rn "h; $sed"
The shell variable meta is a meta sed script because it writes another sed script. The h saves the pattern buffer in the sed hold space. The sed commands generated from the meta sed recall, substitute, and print for each port. This is the result.
http://domain1.net:80/
http://domain1.net:8080/
http://domain1.net:8081/
http://domain2.org:80/
http://domain2.org:8080/
http://domain2.org:8081/
http://domain3.com:80/
http://domain3.com:8080/
http://domain3.com:8081/

Using sed to keep the beginning of a line

I have a file in which some lines start by a >
For these lines, and only these ones, I want to keep the first eleven characters.
How can I do that using sed ?
Or maybe something else is better ?
Thanks !
Muriel
Let's start with this test file:
$ cat file
line one with something or other
>1234567890abc
other line in file
To keep only the first 11 characters of lines starting with > while keeping all other lines:
$ sed -r '/^>/ s/(.{11}).*/\1/' file
line one with something or other
>1234567890
other line in file
To keep only the first eleven characters of lines starting with > and deleting all other lines:
$ sed -rn '/^>/ s/(.{11}).*/\1/p' file
>1234567890
The above was tested with GNU sed. For BSD sed, replace the -r option with -E.
Explanation:
/^>/ is a condition. It means that the command which follows only applies to lines that start with >
s/(.{11}).*/\1/ is a substitution command. It replaces the whole line with just the first eleven characters.
-r turns on extended regular expression format, eliminating the need for some escape characters.
-n turns off automatic printing. With -n in effect, lines are only printed if we explicitly ask them to be printed. In the second case above, that is done by adding a p after the substitute command.
Other forms:
$ sed -r 's/(>.{10}).*/\1/' file
line one with something or other
>1234567890
other line in file
And:
$ sed -rn 's/(>.{10}).*/\1/p' file
>1234567890

sed + remove "#" and empty lines with one sed command

how to remove comment lines (as # bal bla ) and empty lines (lines without charecters) from file with one sed command?
THX
lidia
If you're worried about starting two sed processes in a pipeline for performance reasons, you probably shouldn't be, it's still very efficient. But based on your comment that you want to do in-place editing, you can still do that with distinct commands (sed commands rather than invocations of sed itself).
You can either use multiple -e arguments or separate commands with a semicolon, something like (just one of these, not both):
sed -i 's/#.*$//' -e '/^$/d' fileName
sed -i 's/#.*$//;/^$/d' fileName
The following transcript shows this in action:
pax> printf 'Line # with a comment\n\n# Line with only a comment\n' >file
pax> cat file
Line # with a comment
# Line with only a comment
pax> cp file filex ; sed -i 's/#.*$//;/^$/d' filex ; cat filex
Line
pax> cp file filex ; sed -i -e 's/#.*$//' -e '/^$/d' filex ; cat filex
Line
Note how the file is modified in-place even with two -e options. You can see that both commands are executed on each line. The line with a comment first has the comment removed then all is removed because it's empty.
In addition, the original empty line is also removed.
#paxdiablo has a good answer but it can be improved.
(1) The '/^$/d' clause only matches 100% blank lines.
If you want to also match lines that are entirely whitespace (spaces, tabs etc.) use this instead:
'/^\s*$/d'
(2) The 's/#.*$//' clause only matches lines that start with the # character in column 0.
If you want to also match lines that have only whitespace before the first # use this instead:
'/^\s*#.*$/d'
The above criteria may not be universal (e.g. within a HEREDOC block, or in a Python multi-line string the different approaches could be significant), but in many cases the conventional definition of "blank" lines include whitespace-only, and "comment" lines include whitespace-then-#.
(3) Lastly, on OSX at least, the #paxdiablo solution in which the first clause turns comment lines into blank lines, and the second clause strips blank lines (including what were originally comments) doesn't work. It seems to be more portable to make both clauses /d delete actions as I've done.
The revised command incorporating the above is:
sed -e '/^\s*#.*$/d' -e '/^\s*$/d' inputFile
This tiny jewel removes all # comments, no matter where they begin in a line (see caution below):
sed -e 's/\s*#.*$//'
Example:
text="
this is a # test
#this is a test
#this is a #test
this is # another #test
"
$echo "$text" | sed -e 's/\s*#.*$//'
this is a
this is
Next this removes any resulting blank lines:
$echo "$text" | sed -e 's/\s*#.*$//' | sed -e '/^\s*$/d'
Caution: Depending on the syntax and/or interpretation of the lines your processing, this might not be an appropriate solution, as it just stupidly removes end of lines, even if the '#' is part of your data or code. However, for use cases where you'll never use a hash except for as an end of line comment then it works fine. So just as with all coding, context must be taken into consideration.
Alternative variant, using grep:
cat file.txt | grep -Ev '(#.*$)|(^$)'
you can use awk
awk 'NF{gsub(/^[ \t]*#/,"");print}' file
First example(paxdiablo) is very good except its not change file, just output result. If you want to change it inline:
sudo sed -i 's/#.*$//;/^$/d' inputFile
On (one of) my linux boxes, sed understands extended regular expressions with the -r option, so:
sed -r '/(^\s*#)|(^\s*$)/d' squid.conf.installed
is very useful for showing all non-blank, non comment lines.
The regex matches either start of line followed by zero or more spaces or tabs followed by either a hash or end of line, and deletes those matching lines from the input.

Have sed ignore non-matching lines

How can I make sed filter matching lines according to some expression, but ignore non-matching lines, instead of letting them print?
As a real example, I want to run scalac (the Scala compiler) on a set of files, and read from its -verbose output the .class files created. scalac -verbose outputs a bunch of messages, but we're only interested in those of the form [wrote some-class-name.class].
What I'm currently doing is this (|& is bash 4.0's way to pipe stderr to the next program):
$ scalac -verbose some-file.scala ... |& sed 's/^\[wrote \(.*\.class\)\]$/\1/'
This will extract the file names from the messages we're interested in, but will also let all other messages pass through unchanged! Of course we could do instead this:
$ scalac -verbose some-file.scala ... |& grep '^\[wrote .*\.class\]$' |
sed 's/^\[wrote \(.*\.class\)\]$/\1/'
which works but looks very much like going around the real problem, which is how to instruct sed to ignore non-matching lines from the input. So how do we do that?
If you don't want to print lines that don't match, you can use the combination of
-n option which tells sed not to print
p flag which tells sed to print what is matched
This gives:
sed -n 's/.../.../p'
Another way with plain sed:
sed -e 's/.../.../;t;d'
s/// is a substituion, t without any label conditionally skips all following commands, d deletes line.
No need for perl or grep.
(edited after Nicholas Riley's suggestion)
Rapsey raised a relevant point about multiple substitutions expressions.
First, quoting an Unix SE answer, you can "prefix most sed commands with an address to limit the lines to which they apply".
Second, you can group commands within curly braces {} (separated with a semi-colon ; or a new line)
Third, add the print flag p on the last substitution
Syntax:
sed -n -e '/^given_regexp/ {s/regexp1/replacement1/flags1;[...];s/regexp1/replacement1/flagsnp}'
Example (see Here document for more details):
Code:
sed -n -e '/^ha/ {s/h/k/g;s/a/e/gp}' <<SAMPLE
haha
hihi
SAMPLE
Result:
keke
sed -n '/.../!p'
There is no need for a substitution.