Have sed ignore non-matching lines - sed

How can I make sed filter matching lines according to some expression, but ignore non-matching lines, instead of letting them print?
As a real example, I want to run scalac (the Scala compiler) on a set of files, and read from its -verbose output the .class files created. scalac -verbose outputs a bunch of messages, but we're only interested in those of the form [wrote some-class-name.class].
What I'm currently doing is this (|& is bash 4.0's way to pipe stderr to the next program):
$ scalac -verbose some-file.scala ... |& sed 's/^\[wrote \(.*\.class\)\]$/\1/'
This will extract the file names from the messages we're interested in, but will also let all other messages pass through unchanged! Of course we could do instead this:
$ scalac -verbose some-file.scala ... |& grep '^\[wrote .*\.class\]$' |
sed 's/^\[wrote \(.*\.class\)\]$/\1/'
which works but looks very much like going around the real problem, which is how to instruct sed to ignore non-matching lines from the input. So how do we do that?

If you don't want to print lines that don't match, you can use the combination of
-n option which tells sed not to print
p flag which tells sed to print what is matched
This gives:
sed -n 's/.../.../p'

Another way with plain sed:
sed -e 's/.../.../;t;d'
s/// is a substituion, t without any label conditionally skips all following commands, d deletes line.
No need for perl or grep.
(edited after Nicholas Riley's suggestion)

Rapsey raised a relevant point about multiple substitutions expressions.
First, quoting an Unix SE answer, you can "prefix most sed commands with an address to limit the lines to which they apply".
Second, you can group commands within curly braces {} (separated with a semi-colon ; or a new line)
Third, add the print flag p on the last substitution
Syntax:
sed -n -e '/^given_regexp/ {s/regexp1/replacement1/flags1;[...];s/regexp1/replacement1/flagsnp}'
Example (see Here document for more details):
Code:
sed -n -e '/^ha/ {s/h/k/g;s/a/e/gp}' <<SAMPLE
haha
hihi
SAMPLE
Result:
keke

sed -n '/.../!p'
There is no need for a substitution.

Related

What is the significance of -n parameter passed to sed command?

Can someone please tell me how does sed -n '1!p work ? Below is the full command which I am using to sort my pods in k8s based on nodes they are assigned.
kubectl get pods -o wide --all-namespaces|sort -k8 -r| sed -n '1!p'
The above command works perfectly and the sed part removes the first header line from the final output.
I want to understand how does the sed part work and what's the significance of the parameters passed to sed?
From info sed:
-n:
By default, 'sed' prints out the pattern space at the end of each
cycle through the script (*note How 'sed' works: Execution Cycle.).
These options disable this automatic printing, and 'sed' only
produces output when explicitly told to via the 'p' command.
1p: print first line
1!p: do not print first line.
By default, sed will output every line it parses.
-n option is there to hide this output, and display only the lines specified with the p option.
In your exemple, sed -n '1!p' means "Display every line but first".
A more understandable example is when you want to search/replace with sed. If you want to see the whole resulting file you'll use this:
sed 's/from/to/g' file.txt
But if you only want to see which lines have been changed, use this:
sed -n 's/from/to/gp' file.txt

Parsing a line with sed using regular expression

Using sed I want to parse Heroku's log-runtime-metrics like this one:
2016-01-29T00:38:43.662697+00:00 heroku[worker.2]: source=worker.2 dyno=heroku.17664470.d3f28df1-e15f-3452-1234-5fd0e244d46f sample#memory_total=54.01MB sample#memory_rss=54.01MB sample#memory_cache=0.00MB sample#memory_swap=0.00MB sample#memory_pgpgin=17492pages sample#memory_pgpgout=3666pages
the desired output is:
worker.2: 54.01MB (54.01MB is being memory_total)
I could not manage although I tried several alternatives including:
sed -E 's/.+source=(.+) .+memory_total=(.+) .+/\1: \2/g'
What is wrong with my command? How can it be corrected?
The .+ after source= and memory_total= are both greedy, so they accept as much of the line as possible. Use [^ ] to mean "anything except a space" so that it knows where to stop.
sed -E 's/.+source=([^ ]+) .+memory_total=([^ ]+) .+/\1: \2/g'
Putting your content into https://regex101.com/ makes it really obvious what's going on.
I'd go for the old-fashioned, reliable, non-extended sed expressions and make sure that the patterns are not too greedy:
sed -e 's/.*source=\([^ ]*\) .*memory_total=\([^ ]*\) .*/\1: \2/'
The -e is not the opposite of -E, which is primarily a Mac OS X (BSD) sed option; the normal option for GNU sed is -r instead. The -e simply means that the next argument is an expression in the script.
This produces your desired output from the given line of data:
worker.2: 54.01MB
Bonus question: There are some odd lines within the stream, I can usually filter them out using a grep pipe like | grep memory_total. However if I try to use it along with the sed command, it does not work. No output is produced with this:
heroku logs -t -s heroku | grep memory_total | sed.......
Sometimes grep | sed is necessary, but it is often redundant (unless you are using a grep feature that isn't readily supported by sed, such as Perl regular expressions).
You should be able to use:
sed -n -e '/memory_total=/ s/.*source=\([^ ]*\) .*memory_total=\([^ ]*\) .*/\1: \2/p'
The -n means "don't print by default". The /memory_total=/ matches the lines you're after; the s/// content is the same as before. I removed the g suffix that was there previously; the regex would never match multiple times anyway. I added the p to print the line when the substitution occurs.

Clarification of 'sed' usage

I just blindly followed a command from a tutorial to rename several folders at a time. Can anyone explain the meaning of "p;s" given as the argument to sed's -e option.
[root#LinuxD delsure]# ls
ar1 ar2 ar3 ar4 ar5 ar6 ar7
[root#LinuxD delsure]# find . -type d -name "ar*"|sed -e "p;s/ar/AR/g"|xargs -n2 mv
[root#LinuxD delsure]# ls
AR1 AR2 AR3 AR4 AR5 AR6 AR7
A sed script (the bit following the -e option) can contain multiple commands, separated by ;
The script in your example uses the p command to print the pattern space (i.e. the line just read from the input) followed by the s command to perform a substitution on the pattern space.
By default (unless the pattern space is cleared or the -n option is given to sed) after processing each line the current pattern spaceline is printed again, so the result of the substitution will be printed.
Another way to write the same thing would be:
sed -e "p" -e "s/ar/AR/g"
This separates the commands into two scripts. Another way would be:
sed "p;s/ar/AR/g"
because if the only argument to sed is a script then the -e option is not needed
The argument to the -e option is a script consisting of two commands. The first is p, which prints the unadulterated input, the second is a standard, global substitution. So for input ar1, this should output
ar1
AR1
The other part of this trick is the -n2 option on xargs, which forces it to only use two arguments at a time (instead of as many as it can handle, which would produce very different results).
One way in bash:
$ ls
ar6 ar7
$ find . -name 'ar*' | while IFS= read -r file; do echo mv "$file" "${file^^}"; done
mv ./ar6 ./AR6
mv ./ar7 ./AR7
get rid of the "echo" when you're happy with the output.

Sed to find line numbers with regular expressions

I am trying to use unix sed command to find line numbers that match a particular regular expression. The pattern of my file is below
A<20 spaces>
<something>
<something>
..
..
A<20 spaces>
<soemthing>
<something>
I need all the line numbers of A<20 spaces>
I used sed -n '/A[ ]{20}/'= <file_name> but it does not work. If I manually type in twenty spaces it does work.
Can some one please tweak the above command to make it work.
The braces in the expression need to be escaped with backslashes:
% sed -n '/A[ ]\{20\}/=' test.txt
1
6
An alternative would be to use -E to interpret regular expressions as extended (modern) regular expressions:
% sed -nE '/A[ ]{20}/=' test.txt
1
6
Or potentially use grep instead, which takes fewer characters to specify the same search:
% grep -n 'A[ ]\{20\}' test.txt
The correct syntax would be /A \{20\}/ (and I'm failing to understand where you got your syntax from).
edit: repeat a space, not an A. not my day
use the -E or the -r switch for extended regexp
just to be sur of the content request is answered because it literraly mean "20 spaces" and not Twenty " " char that everyone understand due to the sed line sample failing (i guess this the good one so other reply are fine in this case)
sed -n "/<20 spaces>/ =" file_name

sed + remove "#" and empty lines with one sed command

how to remove comment lines (as # bal bla ) and empty lines (lines without charecters) from file with one sed command?
THX
lidia
If you're worried about starting two sed processes in a pipeline for performance reasons, you probably shouldn't be, it's still very efficient. But based on your comment that you want to do in-place editing, you can still do that with distinct commands (sed commands rather than invocations of sed itself).
You can either use multiple -e arguments or separate commands with a semicolon, something like (just one of these, not both):
sed -i 's/#.*$//' -e '/^$/d' fileName
sed -i 's/#.*$//;/^$/d' fileName
The following transcript shows this in action:
pax> printf 'Line # with a comment\n\n# Line with only a comment\n' >file
pax> cat file
Line # with a comment
# Line with only a comment
pax> cp file filex ; sed -i 's/#.*$//;/^$/d' filex ; cat filex
Line
pax> cp file filex ; sed -i -e 's/#.*$//' -e '/^$/d' filex ; cat filex
Line
Note how the file is modified in-place even with two -e options. You can see that both commands are executed on each line. The line with a comment first has the comment removed then all is removed because it's empty.
In addition, the original empty line is also removed.
#paxdiablo has a good answer but it can be improved.
(1) The '/^$/d' clause only matches 100% blank lines.
If you want to also match lines that are entirely whitespace (spaces, tabs etc.) use this instead:
'/^\s*$/d'
(2) The 's/#.*$//' clause only matches lines that start with the # character in column 0.
If you want to also match lines that have only whitespace before the first # use this instead:
'/^\s*#.*$/d'
The above criteria may not be universal (e.g. within a HEREDOC block, or in a Python multi-line string the different approaches could be significant), but in many cases the conventional definition of "blank" lines include whitespace-only, and "comment" lines include whitespace-then-#.
(3) Lastly, on OSX at least, the #paxdiablo solution in which the first clause turns comment lines into blank lines, and the second clause strips blank lines (including what were originally comments) doesn't work. It seems to be more portable to make both clauses /d delete actions as I've done.
The revised command incorporating the above is:
sed -e '/^\s*#.*$/d' -e '/^\s*$/d' inputFile
This tiny jewel removes all # comments, no matter where they begin in a line (see caution below):
sed -e 's/\s*#.*$//'
Example:
text="
this is a # test
#this is a test
#this is a #test
this is # another #test
"
$echo "$text" | sed -e 's/\s*#.*$//'
this is a
this is
Next this removes any resulting blank lines:
$echo "$text" | sed -e 's/\s*#.*$//' | sed -e '/^\s*$/d'
Caution: Depending on the syntax and/or interpretation of the lines your processing, this might not be an appropriate solution, as it just stupidly removes end of lines, even if the '#' is part of your data or code. However, for use cases where you'll never use a hash except for as an end of line comment then it works fine. So just as with all coding, context must be taken into consideration.
Alternative variant, using grep:
cat file.txt | grep -Ev '(#.*$)|(^$)'
you can use awk
awk 'NF{gsub(/^[ \t]*#/,"");print}' file
First example(paxdiablo) is very good except its not change file, just output result. If you want to change it inline:
sudo sed -i 's/#.*$//;/^$/d' inputFile
On (one of) my linux boxes, sed understands extended regular expressions with the -r option, so:
sed -r '/(^\s*#)|(^\s*$)/d' squid.conf.installed
is very useful for showing all non-blank, non comment lines.
The regex matches either start of line followed by zero or more spaces or tabs followed by either a hash or end of line, and deletes those matching lines from the input.