Remove multi-line comments - sed

How do I remove all comments if they start with /* and end with */
I have tried the following. It works for one line comment.
sed '/\/\*/d'
But it does not remove multiline comments. for e.g. the second and third lines are not removed.
/*!50500 PARTITION BY RANGE (TO_SECONDS(date_time ))
PARTITION 20120102parti VALUES LESS THAN (63492681600),
(PARTITION 20120101parti VALUES LESS THAN (63492595200) */ ;
In the above example, I need to retain the last ; after the closing comment sign.

Here's one way using GNU sed. Run like sed -rf script.sed file.txt
Contents of script.sed:
:a
s%(.*)/\*.*\*/%\1%
ta
/\/\*/ !b
N
ba
Alternatively, here's the one liner:
sed -r ':a; s%(.*)/\*.*\*/%\1%; ta; /\/\*/ !b; N; ba' file.txt

If this is in a C file then you MUST use a C preprocessor for this in combination with other tools to temporarily disable specific preprocessor functionality like expanding #defines or #includes, all other approaches will fail in edge cases. This will work for all cases:
[ $# -eq 2 ] && arg="$1" || arg=""
eval file="\$$#"
sed 's/a/aA/g; s/__/aB/g; s/#/aC/g' "$file" |
gcc -P -E $arg - |
sed 's/aC/#/g; s/aB/__/g; s/aA/a/g'
Put it in a shell script and call it with the name of the file you want parsed, optionally prefixed by a flag like "-ansi" to specify the C standard to apply.
See https://stackoverflow.com/a/35708616/1745001 for details.

This should do
sed 's|/\*|\n&|g;s|*/|&\n|g' a.txt | sed '/\/\*/,/*\//d'
For test:
a.txt
/* Line test
multi
comment */
Hello there
this would stay
/* this would be deleteed */
Command:
$ sed 's|/\*|\n&|g;s|*/|&\n|g' a.txt | sed '/\/\*/,/*\//d'
Hello there
this would stay

This might work for you (GNU sed):
sed -r ':a;$!{N;ba};s|/\*[^*]*\*+([^/*][^*]*\*+)*/||' file
It's a start, anyway!

To complement Ed's answer (focused on C files), I would suggest the excellent sed script remccoms3.sed by Brian Hiles for non-C files (e.g. PL/SQL file). It handles C and C++ (//) comments and correctly skips comments inside strings. The script is available here: http://sed.sourceforge.net/grabbag/scripts/remccoms3.sed

Try this
sed "/^\//,/\/;/d" filename

A sed-only solution:
sed -r 's/\/\*(.*?)\*\///g' \
| sed -r 's/(.+)(\/\*)/\1\n\2/g'\
| sed -r 's/(\*\/)(.+)/\1\n\2/g' \
| sed '/\/\*/,/\*\// s/.*//'
Shortcomings: multi-line comments will leave empty lines (because sed is line-based, unless you put in superhuman efforts).
Explanation
s/\/\*(.*?)\*\///g will take care of single-line comments.
s/(.+)(\/\*)/\1\n\2/g and s/(\*\/)(.+)/\1\n\2/g will split lines at the beginning and end of multi-line comments.
/\/\*/,/\*\// s/.*// will run the command s/.*// effectively deleting all the lines between the patterns \/\* and \*\/ - which is /* and */ escaped.

Related

Simple method for finding and replacing string linux

I'm currently trying to find a line in a file
#define IMAX 8000
and replacing 8000 with another number.
Currently, stuck trying to pipe arguments from awk into sed.
grep '#define IMAX' 1d_Euler_mpi_test.c | awk '{print $3}' | sed
Not too sure how to proceed from here.
I would do something like:
sed -i '' '/^#define IMAX 8000$/s/8000/NEW_NUMBER/' 1d_Euler_mpi_test.c
Could you please try following. Place new number's value in place of new_number too.(tested this with GNU sed)
echo "#define IMAX 8000" | sed -E '/#define IMAX /s/[0-9]+$/new_number/'
In case you are reading input from an Input_file and want to save its output into Input_file itself use following then.
sed -E '/#define IMAX /s/[0-9]+$/new_number/' Input_file
Add -i flag in above code in case you want to save output into Input_file itself. Also my codes will catch any digits which are coming at the end of the line which has string #define IMAX so in case you only want to look for 8000 or any fixed number change [0-9]+$ to 8000 etc in above codes then.
You may use GNU sed.
sed -i -e 's/IMAX 8000/IMAX 9000/g' /tmp/file.txt
Which will invoke sed to do an in-place edit due to the -i option. This can be called from bash.
If you really really want to use just bash, then the following can work:
while read a ; do echo ${a//IMAX 8000/IMAX 9000} ; done < /tmp/file.txt > /tmp/file.txt.t ; mv /tmp/file.txt{.t,}
This loops over each line, doing a substitution, and writing to a temporary file (don't want to clobber the input). The move at the end just moves temporary to the original name.

Using variables in sed -f (where sed script is in a file rather than inline)

We have a process which can use a file containing sed commands to alter piped input.
I need to replace a placeholder in the input with a variable value, e.g. in a single -e type of command I can run;
$ echo "Today is XX" | sed -e "s/XX/$(date +%F)/"
Today is 2012-10-11
However I can only specify the sed aspects in a file (and then point the process at the file), E.g. a file called replacements.sed might contain;
s/XX/Thursday/
So obviously;
$ echo "Today is XX" | sed -f replacements.sed
Today is Thursday
If I want to use an environment variable or shell value, though, I can't find a way to make it expand, e.g. if replacements.txt contains;
s/XX/$(date +%F)/
Then;
$ echo "Today is XX" | sed -f replacements.sed
Today is $(date +%F)
Including double quotes in the text of the file just prints the double quotes.
Does anyone know a way to be able to use variables in a sed file?
This might work for you (GNU sed):
cat <<\! > replacements.sed
/XX/{s//'"$(date +%F)"'/;s/.*/echo '&'/e}
!
echo "Today is XX" | sed -f replacements.sed
If you don't have GNU sed, try:
cat <<\! > replacements.sed
/XX/{
s//'"$(date +%F)"'/
s/.*/echo '&'/
}
!
echo "Today is XX" | sed -f replacements.sed | sh
AFAIK, it's not possible. Your best bet will be :
INPUT FILE
aaa
bbb
ccc
SH SCRIPT
#!/bin/sh
STRING="${1//\//\\/}" # using parameter expansion to prevent / collisions
shift
sed "
s/aaa/$STRING/
" "$#"
COMMAND LINE
./sed.sh "fo/obar" <file path>
OUTPUT
fo/obar
bbb
ccc
As others have said, you can't use variables in a sed script, but you might be able to "fake" it using extra leading input that gets added to your hold buffer. For example:
[ghoti#pc ~/tmp]$ cat scr.sed
1{;h;d;};/^--$/g
[ghoti#pc ~/tmp]$ sed -f scr.sed <(date '+%Y-%m-%d'; printf 'foo\n--\nbar\n')
foo
2012-10-10
bar
[ghoti#pc ~/tmp]$
In this example, I'm using process redirection to get input into sed. The "important" data is generated by printf. You could cat a file instead, or run some other program. The "variable" is produced by the date command, and becomes the first line of input to the script.
The sed script takes the first line, puts it in sed's hold buffer, then deletes the line. Then for any subsequent line, if it matches a double dash (our "macro replacement"), it substitutes the contents of the hold buffer. And prints, because that's sed's default action.
Hold buffers (g, G, h, H and x commands) represent "advanced" sed programming. But once you understand how they work, they open up new dimensions of sed fu.
Note: This solution only helps you replace entire lines. Replacing substrings within lines may be possible using the hold buffer, but I can't imagine a way to do it.
(Another note: I'm doing this in FreeBSD, which uses a different sed from what you'll find in Linux. This may work in GNU sed, or it may not; I haven't tested.)
I am in agreement with sputnick. I don't believe that sed would be able to complete that task.
However, you could generate that file on the fly.
You could change the date to a fixed string, like
__DAYOFWEEK__.
Create a temp file, use sed to replace __DAYOFWEEK__ with $(date +%Y).
Then parse your file with sed -f $TEMPFILE.
sed is great, but it might be time to use something like perl that can generate the date on the fly.
To add a newline in the replacement expression using a sed file, what finally worked for me is escaping a literal newline. Example: to append a newline after the string NewLineHere, then this worked for me:
#! /usr/bin/sed -f
s/NewLineHere/NewLineHere\
/g
Not sure it matters but I am on Solaris unix, so not GNU sed for sure.

sed to remove URLs from a file

I am trying to write a sed expression that can remove urls from a file
example
http://samgovephotography.blogspot.com/ updated my blog just a little bit ago. Take a chance to check out my latest work. Hope all is well:)
Meet Former Child Star & Author Melissa Gilbert 6/15/09 at LA's B&N https://hollywoodmomblog.com/?p=2442 Thx to HMB Contributor #kdpartak :)
But I dont get it:
sed 's/[\w \W \s]*http[s]*:\/\/\([\w \W]\)\+[\w \W \s]*/ /g' posFile
FIXED!!!!!
handles almost all cases, even malformed URLs
sed 's/[\w \W \s]*http[s]*[a-zA-Z0-9 : \. \/ ; % " \W]*/ /g' positiveTweets | grep "http" | more
The following removes http:// or https:// and everything up until the next space:
sed -e 's!http\(s\)\{0,1\}://[^[:space:]]*!!g' posFile
updated my blog just a little bit ago. Take a chance to check out my latest work. Hope all is well:)
Meet Former Child Star & Author Melissa Gilbert 6/15/09 at LA's B&N Thx to HMB Contributor #kdpartak :)
Edit:
I should have used:
sed -e 's!http[s]\?://\S*!!g' posFile
"[s]\?" is a far more readable way of writing "an optional s" compared to "\(s\)\{0,1\}"
"\S*" a more readable version of "any non-space characters" than "[^[:space:]]*"
I must have been using the sed that came installed with my Mac at the time I wrote this answer (brew install gnu-sed FTW).
There are better URL regular expressions out there (those that take into account schemes other than HTTP(S), for instance), but this will work for you, given the examples you give. Why complicate things?
The accepted answer provides the approach that I used to remove URLs, etc. from my files. However it left "blank" lines. Here is a solution.
sed -i -e 's/http[s]\?:\/\/\S*//g ; s/www\.\S*//g ; s/ftp:\S*//g' input_file
perl -i -pe 's/^'`echo "\012"`'${2,}//g' input_file
The GNU sed flags, expressions used are:
-i Edit in-place
-e [-e script] --expression=script : basically, add the commands in script
(expression) to the set of commands to be run while processing the input
^ Match start of line
$ Match end of line
? Match one or more of preceding regular expression
{2,} Match 2 or more of preceding regular expression
\S* Any non-space character; alternative to: [^[:space:]]*
However,
sed -i -e 's/http[s]\?:\/\/\S*//g ; s/www\.\S*//g ; s/ftp:\S*//g'
leaves nonprinting character(s), presumably \n (newlines). Standard sed-based approaches to remove "blank" lines, tabs and spaces, e.g.
sed -i 's/^[ \t]*//; s/[ \t]*$//'
do not work, here: if you do not use a "branch label" to process newlines, you cannot replace them using sed (which reads input one line at a time).
The solution is to use the following perl expression:
perl -i -pe 's/^'`echo "\012"`'${2,}//g'
which uses a shell substitution,
'`echo "\012"`'
to replace an octal value
\012
(i.e., a newline, \n), that occurs 2 or more times,
{2,}
(otherwise we would unwrap all lines), with something else; here:
//
i.e., nothing.
[The second reference below provides a wonderful table of these values!]
The perl flags used are:
-p Places a printing loop around your command,
so that it acts on each line of standard input
-i Edit in-place
-e Allows you to provide the program as an argument,
rather than in a file
References:
perl flags: Perl flags -pe, -pi, -p, -w, -d, -i, -t?
ASCII control codes: https://www.cyberciti.biz/faq/unix-linux-sed-ascii-control-codes-nonprintable/
remove URLs: sed to remove URLs from a file
branch labels: How can I replace a newline (\n) using sed?
GNU sed manual: https://www.gnu.org/software/sed/manual/sed.html
quick regex guide: https://www.gnu.org/software/sed/manual/html_node/Regular-Expressions.html
Example:
$ cat url_test_input.txt
Some text ...
https://stackoverflow.com/questions/4283344/sed-to-remove-urls-from-a-file
https://www.google.ca/search?dcr=0&ei=QCsyWtbYF43YjwPpzKyQAQ&q=python+remove++citations&oq=python+remove++citations&gs_l=psy-ab.3...1806.1806.0.2004.1.1.0.0.0.0.61.61.1.1.0....0...1c.1.64.psy-ab..0.0.0....0.-cxpNc6youY
http://scikit-learn.org/stable/modules/generated/sklearn.feature_extraction.text.TfidfVectorizer.html
https://bbengfort.github.io/tutorials/2016/05/19/text-classification-nltk-sckit-learn.html
http://datasynce.org/2017/05/sentiment-analysis-on-python-through-textblob/
https://www.google.ca/?q=halifax&gws_rd=cr&dcr=0&ei=j7UyWuGKM47SjwOq-ojgCw
http://www.google.ca/?q=halifax&gws_rd=cr&dcr=0&ei=j7UyWuGKM47SjwOq-ojgCw
www.google.ca/?q=halifax&gws_rd=cr&dcr=0&ei=j7UyWuGKM47SjwOq-ojgCw
ftp://ftp.ncbi.nlm.nih.gov/
ftp://ftp.ncbi.nlm.nih.gov/1000genomes/ftp/alignment_indices/20100804.alignment.index
Some more text.
$ sed -e 's/http[s]\?:\/\/\S*//g ; s/www\.\S*//g ; s/ftp:\S*//g' url_test_input.txt > a
$ cat a
Some text ...
Some more text.
$ perl -i -pe 's/^'`echo "\012"`'${2,}//g' a
Some text ...
Some more text.
$

sed + remove "#" and empty lines with one sed command

how to remove comment lines (as # bal bla ) and empty lines (lines without charecters) from file with one sed command?
THX
lidia
If you're worried about starting two sed processes in a pipeline for performance reasons, you probably shouldn't be, it's still very efficient. But based on your comment that you want to do in-place editing, you can still do that with distinct commands (sed commands rather than invocations of sed itself).
You can either use multiple -e arguments or separate commands with a semicolon, something like (just one of these, not both):
sed -i 's/#.*$//' -e '/^$/d' fileName
sed -i 's/#.*$//;/^$/d' fileName
The following transcript shows this in action:
pax> printf 'Line # with a comment\n\n# Line with only a comment\n' >file
pax> cat file
Line # with a comment
# Line with only a comment
pax> cp file filex ; sed -i 's/#.*$//;/^$/d' filex ; cat filex
Line
pax> cp file filex ; sed -i -e 's/#.*$//' -e '/^$/d' filex ; cat filex
Line
Note how the file is modified in-place even with two -e options. You can see that both commands are executed on each line. The line with a comment first has the comment removed then all is removed because it's empty.
In addition, the original empty line is also removed.
#paxdiablo has a good answer but it can be improved.
(1) The '/^$/d' clause only matches 100% blank lines.
If you want to also match lines that are entirely whitespace (spaces, tabs etc.) use this instead:
'/^\s*$/d'
(2) The 's/#.*$//' clause only matches lines that start with the # character in column 0.
If you want to also match lines that have only whitespace before the first # use this instead:
'/^\s*#.*$/d'
The above criteria may not be universal (e.g. within a HEREDOC block, or in a Python multi-line string the different approaches could be significant), but in many cases the conventional definition of "blank" lines include whitespace-only, and "comment" lines include whitespace-then-#.
(3) Lastly, on OSX at least, the #paxdiablo solution in which the first clause turns comment lines into blank lines, and the second clause strips blank lines (including what were originally comments) doesn't work. It seems to be more portable to make both clauses /d delete actions as I've done.
The revised command incorporating the above is:
sed -e '/^\s*#.*$/d' -e '/^\s*$/d' inputFile
This tiny jewel removes all # comments, no matter where they begin in a line (see caution below):
sed -e 's/\s*#.*$//'
Example:
text="
this is a # test
#this is a test
#this is a #test
this is # another #test
"
$echo "$text" | sed -e 's/\s*#.*$//'
this is a
this is
Next this removes any resulting blank lines:
$echo "$text" | sed -e 's/\s*#.*$//' | sed -e '/^\s*$/d'
Caution: Depending on the syntax and/or interpretation of the lines your processing, this might not be an appropriate solution, as it just stupidly removes end of lines, even if the '#' is part of your data or code. However, for use cases where you'll never use a hash except for as an end of line comment then it works fine. So just as with all coding, context must be taken into consideration.
Alternative variant, using grep:
cat file.txt | grep -Ev '(#.*$)|(^$)'
you can use awk
awk 'NF{gsub(/^[ \t]*#/,"");print}' file
First example(paxdiablo) is very good except its not change file, just output result. If you want to change it inline:
sudo sed -i 's/#.*$//;/^$/d' inputFile
On (one of) my linux boxes, sed understands extended regular expressions with the -r option, so:
sed -r '/(^\s*#)|(^\s*$)/d' squid.conf.installed
is very useful for showing all non-blank, non comment lines.
The regex matches either start of line followed by zero or more spaces or tabs followed by either a hash or end of line, and deletes those matching lines from the input.

Have sed make substitute on string but SKIP first occurrence

I have been through the sed one liners but am still having trouble with my goal. I want to substitue matching strings on all but the first occurrence of a line. My exact usage would be:
$ echo 'cd /Users/joeuser/bump bonding/initial trials' | sed <<MAGIC HAPPENS>
cd /Users/joeuser/bump\ bonding/initial\ trials
The line replaced the space in bump bonding with the slash space bump\ bonding so that I can execute this line (since when the spaces aren't escaped I wouldn't be able to cd to it).
Update: I solved this by just using single quotes and outputting
cd 'blah blah/thing/another space/'
and then using source to execute the command. But it didn't answer my question. I'm still curious though... how would you use sed to fix it?
s/ /\\ /2g
The 2 specifies that the second one should apply, and the g specifies that all the rest should apply too. (This probably only works on GNU sed. According to the Open Group Base Specification, "If both g and n are specified, the results are unspecified.")
You can avoid the problem with g and n
Replace all of them, then undo the first one:
sed -e 's/ /\\ /g' -e 's/\\ / /1'
Here's another method which uses the t branch-if-substituted command:
sed ':a;s/\([^ ]* .*[^\\]\) \(.*\)/\1\\ \2/;ta'
which has the advantage of leaving existing backslash-space sequences in the input intact.
use awk
$ echo cd 'blah blah/thing/another space/' | awk '{for(i=2;i<NF;i++) $i=$i"\\"}1'
cd blah\ blah/thing/another\ space/
$ echo 'cd /Users/joeuser/bump bonding/initial trials' | awk '{for(i=2;i<NF;i++) $i=$i"\\"}1'
cd /Users/joeuser/bump\ bonding/initial\ trials