I would like to use sed to remove all comments from a text file. Let's say that comment starts from 'A' character and end at the new line character. I would like to remove everything starting from the 'A' to the end of line including new line character. However, I don't want to remove comments starting from "AA".
Sample input:
%% comment to do not delete
% comment to delete
% another comment to delte
%% comment to do not delete
Some text % comment to delete
and some more text %% comment to do not delete
Desired output:
%% comment to do not delete
%% comment to do not delete
Some text and some more text %% comment to do not delete
Try doing this :
$ perl -pe '/^[^%]*%%/ && next; s/%.*\n//g' file.txt
Output
%% comment to do not delete
%% comment to do not delete
Some text and some more text %% comment to do not delete
Note
If you need to change the file in-place, add -i switch (after your tests), so :
$ perl -i -pe '/^[^%]*%%/ && next; s/%.*\n//g' file.txt
Thanks scrutinizer for contributing.
Perfect application of perl's negative look-behind assertion:
perl -pe 's/(?<!%)%(?!%).*$//s' << END
%% comment to do not delete
% comment to delete
% another comment to delte
%% comment to do not delete
Some text % comment to delete
and some more text %% comment to do not delete
END
outputs
%% comment to do not delete
%% comment to do not delete
Some text and some more text %% comment to do not delete
The s flag ensures the dot will match a newline to achieve the "line joining" as requested.
This kind of regex matching can cause you problems, for instance if you have a line like
The date is `date +%Y%m%d` % this is a comment
You will end up with
The date is `date +
If your actual comment requires whitespace around it, you could use this regex:
(^| )%( .*|)$
which means
the beginning of line OR a space
followed by the comment char
followed by (a space and zero or more chars) OR nothing
followed by end of line
Perhaps this:
2nd update
$ sed -e '/^%[^%]/d' -e 's/ %[^%]*$/#/' -e :a -e '/#/N; s/\n//; ta' input | sed 's/#/ /g'
%% comment to do not delete
%% comment to do not delete
Some text and some more text %% comment to do not delete
Use Expression Order with Sed
With sed, the order of instructions can be important. For example:
$ sed -ne '/^% /d; /[^%]%.*/ {s/%.*//; n}; p' /tmp/corpus
%% comment to do not delete
%% comment to do not delete
and some more text %% comment to do not delete
In this example, the sed script performs its tasks in this order:
Suppress output.
Delete lines that start with a single percent sign.
Use substitution to remove all characters from a single percent to end-of-line, then append the next line to the pattern space without a newline.
Print the pattern space.
This script works with the corpus you provided in your question. It is not guaranteed to work with any other corpus without modifications, and explicitly does not work if the lines you append to the pattern space contain comment characters.
edit Added changes so it works well on last line in the file...
Try:
sed -e :a -e '/^[^%]*%%/n; /%/{s/%.*//; N; s/\n//;};ta' file
Tested with input:
%% comment to do not delete
% comment to delete
% another comment to delte
%
%% comment to do not delete
Some text % comment to delete
Some more text % more comment to delete
and some more text %% comment to do not delete
fdgdfgdgdgd %
gfdgd
some text followed by %% comment to not delete that contains a % somewhere
some text followed by % comment to delete that contains %% somewhere
hello there
output:
%% comment to do not delete
%% comment to do not delete
Some text Some more text and some more text %% comment to do not delete
fdgdfgdgdgd gfdgd
some text followed by %% comment to not delete that contains a % somewhere
some text followed by hello there
Related
In a file, I'm having the lines like this -
a.lo a.o: abc/util.c \
/usr/lib/def.h
b.lo b.o: hash/imp.h \
/usr/lib/toy.c \
c.lo c.o: high/scan.c \
high/scan_f.c
Here you can see one extra \ (back slash) at the end of line number 4 (/usr/lib/toy.c ). How can I use sed command to remove this / (back slash)? Because of this I'm getting "*** multiple target patterns. Stop." error.
P.S. - I'm having this extra \ (back slash) at multiple places in my file. So using sed to delete it by line number won't be feasible. Need something which can check for .lo .o and check a line before, if it finds a \ (back slash) remove it.
Maybe not the simplest but this should work:
sed -nE '${s/\\$//;p;};N;s/\\([^\\]*:)/\1/;P;D' input_file
The main idea is to concatenate input lines in the pattern space (a sed internal text buffer), such that it always contains 2 consecutive lines, separated by a newline character. We then just delete the last \ before a :, if any, print the first of the 2 lines and remove it from the pattern space before continuing with the next line.
sed commands are separated by semi-columns (;) and grouped with curly braces ({...}). They are optionally preceded by a line(s) specification, for instance $ that stands for the last line of the input. So, in our case, ${s/\\$//;p;} applies only to the last line while the rest (N;s/\\([^\\]*:)/\1/;P;D) applies to all lines.
The -n option suppresses the default output. We need this to control the output ourselves with the p (print) command.
The -E option enables the use of extended regular expressions.
Let's first explain the tricky part: N;s/\\([^\\]*:)/\1/;P;D. It is a list of 4 commands that are run for each line of the input because there is no line(s) specification before the commands.
When sed starts processing the input the pattern space already contains the first line (a.lo a.o: abc/util.c \ in your example). This is how sed works: by default it puts the current line in the pattern space, applies the commands and restarts with the next line.
N appends the next input line (/usr/lib/def.h) to the pattern space with a newline character as separator. The pattern space now contains:
a.lo a.o: abc/util.c \
/usr/lib/def.h
N also increments the current line number which becomes 2.
s/\\([^\\]*:)/\1/ deletes the last \ before the first : in the pattern space, if there is one. In our example the only \ is after the first :. The pattern space is not modified.
P prints the first part of the pattern space, up to the first newline character. In our example what is printed is:
a.lo a.o: abc/util.c \
D deletes the first part of the pattern space, up to the first newline character (what has just been printed). The pattern space contains:
/usr/lib/def.h
D also starts a new cycle but different from the normal sed processing, it does not read the next line and leaves the pattern space and current line number unmodified. So when restarting the pattern space contains line number 2 and the the current line number is still 2.
By induction we see that, each time sed restarts executing the list of commands, the pattern space contains the current line, as normal. When processing line number 4 of your example it contains:
/usr/lib/toy.c \
After N it contains:
/usr/lib/toy.c \
c.lo c.o: high/scan.c \
And there, the substitution command (s/\\([^\\]*:)/\1/) matches and deletes the first \:
/usr/lib/toy.c
c.lo c.o: high/scan.c \
It is thus:
/usr/lib/toy.c
that is printed and removed from the pattern space. Exactly what you want.
The last line needs a special treatment. When we start processing it the pattern space contains:
high/scan_f.c
If we don't do anything special N does not change it (there is no next line to concatenate) and terminates the processing. The last line is never printed.
This is why another list of commands is needed, just for the last line: ${s/\\$//;p;}. It applies only to the last line because it is preceded by a line(s) specification ($ for last line). The first command in the list (substitute s/\\$//) removes a trailing \, if there is one. The second (p) prints the pattern space.
Note: if you know that the last line does not end with a trailing backslash you can simplify a bit:
sed -nE '$p;N;s/\\([^\\]*:)/\1/;P;D' input_file
I agree with #G.M. in general, but this will work.
sed captures text before trailing "\" (if present) on lines starting with "\" and prints only that text on those lines. All other text is also printed, of course
sed -e 's/\(.* \)\\$/\1/' input_file
The question is a bit unclear about how to identify the lines from which a trailing backslash should be removed, but inasmuch as the input looks like set of a makefile-format prerequisite lists from which some lines have been removed, I take the objective to be to remove backslashes where they appear after the last (remaining) prerequisite in a list. That requires looking ahead to the next line, so it will be helpful to make use of sed's hold space to store data while you look ahead at the next line to figure out what to do with it.
This would be a pretty robust solution for that problem:
sed -nE 's/\s*(\\){0,1}$/ \\/; :a; /:/ { x; s/\s*\\$//; p; d; }; H; $ { s/.*/:/; b a }' input
That builds up each prerequisite list in the hold space, with backslashes and newlines embedded, then dumps it when the next target list or the end of the input arrives.
Details:
the -n option turns off automatically printing the pattern space after each line
the -E option turns on extended regular expressions
the sed expression contains several sub-expressions, joined by semicolons:
s/\s*(\\){0,1}$/ \\/ : ensure that the current line in the pattern space ends with a space and backslash, without adding a second backslash to lines that already have one
:a : labels that point in the script 'a'
/:/ { x; s/\s*\\$//; p; d; } : on lines that contain a colon, swap the pattern and hold spaces, remove the trailing backslash from (the new contents of) the pattern space, print the result, then start the next cycle
H : (if control reaches this point) append a newline and the contents of the pattern space to the hold space
$ { s/.*/:/; b a } : on the last line of input trigger dumping the hold space by putting a colon in the pattern space and jumping to label 'a'
[end of expression] : read the next line into the pattern space and start over
Alternatively, it would more exactly follow your request, and avoid introducing a leading blank line, to do this:
sed -n ':a; /\\$/! { p; d; }; h; :b; $ { x; s/\\//; p; }; n; /:/ { x; s/\\$//; p; x; b a; }; H; /\\$/ b b; s/.*//; x; p' input
That also assembles pieces in the hold space before ultimately printing them, but it goes about it in a different way:
it starts (at label a) by checking whether the line in the pattern space ends with a backslash. If not (/\\$/!), then it prints the pattern space and starts the next cycle.
otherwise, it replaces the current contents of the hold space with the contents of the pattern space (which must already end with a backslash), then
(at label b) if the current line is the last then it retrieves the contents of the hold space, strips the trailing newline, and prints the result ($ { x; s/\\//; p; }). Either way,
it attempts to read the next input line, and terminates if there are no more (n).
if that results in the pattern space containing a colon within, then the contents of the hold space are printed, less trailing backslash, and control is sent back to label a to process the colon-containing line as a new first line (/:/ { x; s/\\$//; p; x; b a; }).
otherwise, a newline and the contents of the pattern space are appended to the hold space (H).
if the pattern space ends with a backslash then control branches back to label b to consider reading another line (/\\$/ b b).
otherwise, the hold space is printed and cleared (s/.*//; x; p), and
if there are any more lines then the next is read and a new cycle started.
That makes fewer assumptions about the nature of the input, but it is a bit more complicated.
I am on Mac, I want to find a pattern in lines, replace it with something, then append the resulting string to the end of the original line. Here is what I tried:
echo "test='123'" | sed -E '/([^a-z])/ s/$/ \1/'
sed: 1: "/([^a-z])/ s/$/ \1/": \1 not defined in the RE
What do I need to define \1? I thought I did it with ([^a-z]). No?
Edit: Perhaps this code will represent better what I want:
1) echo "test='123'" | sed 's/[a-zA-Z0-9]//g'
2) I want the new line = original line + line #1 above
In other words:
Before (what I get): test='123'
After (what I want): test='123' =''
You can edit this command this way:
echo "test='123'" | sed -E 'h;s/([a-zA-Z0-9])//g;G;s/(.*)\n(.*)/\2\1/'
For readability, the script, line by line, reads
h
s/([a-zA-Z0-9])//g
G
s/(.*)\n(.*)/\2\1/
h stores the current line in the hold space,
your s command does what it does
G appends the content of the hold space, i.e. the original line, to the pattern space, i.e. the current line as you have edited it, putting a newline \n in between.
another s command reorders the two pieces, also removing the \n that the G command inserted.
Comments
Your original attempt sed -E '/([^a-z])/ s/$/ \1/' could not work because \1 refers to what is captured by the leftmost (…) group in the search portion of the s command, it does not "remember" the group(s) you used to address the line.
Once you print the pattern space with p, a newline comes with it, and once it's been printed, there's no way you can remove it within the same sed program.
I'm attempting to create a single newline at the end of a file.
My command is this:
gsed -i '$a\\r' outfiles/*.txt
Somehow this creates two newlines, and I cannot figure out what I am doing wrong.
Any thoughts?
In my first thought I would on the last line substitute end of line with a newline.
sed '$s/$/\n/'
But my second thought is just nice:
sed '$G'
Grabbing from a hold space appends a newline to pattern space and then appends the hold space to pattern space. Because hold space is empty, it effectively adds just only the newline.
Keep it clear and simple, just use gawk:
gawk -i inplace 'ENDFILE{print ""}' outfiles/*.txt
Given the input:
1234
5678
9abc
defg
hijk
I'd like the output:
12345678
56789abc
9abcdefg
defghijk
There are lots of examples using sed(1) to joining a pair of lines, then the next pair after that pair and so on. But I haven't found an example that joins lines 1 with 2, 2 with 3, 3 with 4, ...
sed(1) solution preferred. Other options are less interesting - e.g., awk(1), python(1) and perl(1) implementations are fairly easy. I'm specifically stumped on a successful sed(1) incantation.
sed '1h;1d;x;G;s/\n//'
I guess it can be done some other way, but this works for me:
$ cat in
1234
5678
9abc
defg
hijk
$ sed '1h;1d;x;G;s/\n//' in
12345678
56789abc
9abcdefg
defghijk
How it works: we put first line to hold space and that's it for first line. Every line after the first - swap it with hold space, append the new hold space to the old hold space, remove newline.
This does it (now improved, thanks to potong's hint):
$ sed -n 'N;s/\n\(.*\)/\1&/;P;D' infile
12345678
56789abc
9abcdefg
defghijk
In detail:
N # Append next line to pattern space
s/\n\(.*\)/\1&/ # Make 111\n222 into 111222\n222
P # Print up to first newline
D # Delete up to first newline
The substitution makes these two lines
1111
2222
which in the pattern space look like 1111\n2222 into
11112222
2222
and the P and D print/delete the first line from the pattern space.
Notice that we never hit the bottom of the script (D starts a new loop) until the very last line, where N can't fetch a new line and would just print the last line on its own, if we didn't suppress that with -n.
Tweaking another answer (full credit to #aragaer) to handle single line input (and be more portable to bsd sed as well as gnu sed than the original version - update: that answer has been edited another way for portability):
% cat >> inputfile << eof
12
34
56
eof
% sed -e '1{$p;h;d' -e '}' -e 'x;G;s/\n//' inputfile # bsd + gnu sed [1]
1234
3456
or
% cat joinsuccessive.sed
1{
$p;h;d
}
x;G;s/\n//
% sed -f joinsuccessive.sed inputfile
1234
3456
Here's an annotated version.
1{ # special case for first line only:
$p # even MORE special case: print current line for input with
# only a single line
h # add line 1 to hold space (for joining with successive lines)
d # delete pattern space and move to next line (without printing)
}
x # for lines 2+, swap pattern space (current line) and hold space
G # add newline + hold space (now has current line) to pattern space
# (previous line) giving prev line, newline, curr line in pattern
# space (and curr line is in hold space)
s/\n// # remove newline added by G (between lines) before printing the
# pattern space
[1] bsd sed(1) wants a closing brace to be on a line by itself. Use -e to "build" the script or put the commands in a sed script file (and use -f joinsuccessive.sed).
I want to use sed to delete part of code (paragraph) beginning with a pattern and ending with a semicolon (;).
Now I came across an example to delete a paragraph separated by new lines
sed -e '/./{H;$!d;}' -e 'x;/Pattern/!d'
I'm confused how to use semicolon not as a delimiter but as a pattern instead.
Thanks.
Other option is to use the GNU extension of address range.
Next example means: delete everything from a line which begins with pattern until a line ending with semicolon.
sed '/pattern/,/;$/ d' infile
EDIT to comment of Harsh:
Try next sed command:
sed '/^\s*LOG\s*(.*;\s*$/ d ; /^\s*LOG/,/;\s*$/ d' infile
Explanation:
/^\s*LOG\s*(.*;\s*$/ d # Delete line if begins with 'LOG' and ends with semicolon.
/^\s*LOG/,/;\s*$/ d # Delete range of lines between one that begins with LOG and
# other that ends with semicolon.
This might work for you:
cat <<! >file
> a
> b
> ;
> x
> y
> ;
> !
sed '/^[^;]*$/{H;$!d};x;s/;//;/x/!d' file
x
y
Explanation:
For any line the does not have a single ; in it /^[^;]*$/
Append the above line to the hold space (HS) and delete the pattern space (PS) and begin the next iteration unless it is the last line in the file. {H;$!d}
If a line is empty /^$/ or the last line of the file:
Swap to the HS x
Delete the first ; s/;//
Search for pattern (x) and if not found delete the PS /x/!d
N.B. This finds any pattern /x/ to find the beginning pattern use /^x/.
EDIT:
After having seen your data and expected result, this may work for you:
sed '/^\s*LOG(.*);/d;/^\s*LOG(/,/);/d' file