Can I use the sed command to replace multiple empty line with one empty line? - sed

I know there is a similar question in SO How can I replace mutliple empty lines with a single empty line in bash?. But my question is can this be implemented by just using the sed command?
Thanks

Give this a try:
sed '/^$/N;/^\n$/D' inputfile
Explanation:
/^$/N - match an empty line and append it to pattern space.
; - command delimiter, allows multiple commands on one line, can be used instead of separating commands into multiple -e clauses for versions of sed that support it.
/^\n$/D - if the pattern space contains only a newline in addition to the one at the end of the pattern space, in other words a sequence of more than one newline, then delete the first newline (more generally, the beginning of pattern space up to and including the first included newline)

You can do this by removing empty lines first and appending line space with G command:
sed '/^$/d;G' text.txt
Edit2: the above command will add empty lines between each paragraph, if this is not desired, you could do:
sed -n '1{/^$/p};{/./,/^$/p}'
Or, if you don't mind that all leading empty lines will be stripped, it may be written as:
sed -n '/./,/^$/p'
since the first expression just evaluates the first line, and prints it if it is blank.
Here: -n option suppresses pattern space auto-printing, /./,/^$/ defines the range between at least one character and none character (i.e. empty space between newlines) and p tells to print this range.

Related

Matching patterns across lines

Suppose I have a file which contains:
something
line=1
file=2
other
lines
ignore
something
line=2
file=3
other
lines
ignore
Eventually, I want a unique list of the line and file combinations in each section. In the first stage I am trying to get sed to output just those lines combined into one line, like
line=1file=2
line=2file=3
Then I can use sort and uniq.
So I am trying
sed -n -r 's/(line=)(.*?)(\r)(file=)(.*?)(\r)/\1\2\4\5/p' sample.txt
(It isn't necessarily just a number after each)
But it won't match across the lines. I have tried \n and \r\n but it doesn't seem to be the style of new line, since:
sed -n -r 's/(line=)(.*?)(\r)/\1\2/p' sample.txt
Will output the "line=" lines, but I just can't get it to span the new line, and collect the second line as well.
By default, sed will operate only on chunks separated by \n character, so you can never match across multiple lines. Some sed implementations support -z option which will make it to operate on chunks separated by ASCII NUL character instead of newline character (this could work for small files, assuming NUL character won't affect the pattern you want to match)
There are also some sed commands that can be used for multiline processing
sed -n '/line=/{N;s/\n//p}'
N command will add the next line to current chunk being processed (which has to match line= in this case)
s/\n//p then delete the newline character, so that you get the output as single line
If your input has dos style line ending, first convert it to unix style (see Why does my tool output overwrite itself and how do I fix it?) or take care of \r as well
sed -n '/line=/{N;s/\r\n//p}'
Note that these commands were tested on GNU sed, syntax may vary for other implementations

gnu sed remove portion of line after pattern match with special characters

The goal is to use sed to return only the url from each line of FF extension Mining Blocker which uses this format for its regex lines:
{"baseurl":"*://002.0x1f4b0.com/*", "suburl":"*://*/002.0x1f4b0.com/*"},
{"baseurl":"*://003.0x1f4b0.com/*", "suburl":"*://*/003.0x1f4b0.com/*"},
the result should be:
002.0x1f4b0.com
003.0x1f4b0.com
One way would be to keep everything after suburl":"*://*/ then remove each occurrence of /*"},
I found https://unix.stackexchange.com/questions/24140/return-only-the-portion-of-a-line-after-a-matching-pattern but the special characters are a problem.
this won't work:
sed -n -e s#^.*suburl":"*://*/##g hosts
Would someone please show me how to mark the 2 asterisks in the string so they are seen by regex as literal characters, not wildcards?
edit:
sed -n 's#.*://\*/\([^/]\+\)/.*#\1#p' hosts
doesn't work, unfortunately.
regarding character substitution, thanks for directing me to the references.
I reduced the searched-for string to //*/ and used ASCII character codes like this:
sed -n -e s#^.*\d047\d047\d042\d047##g hosts
Unfortunately, that didn't output any changes to the lines.
My assumptions are:
^.*something specifies everything up to and including the last occurrence of "something" in a line
sed -n -e s#search##g deletes (replace with nothing) "search" within a line
So, this line:
sed -n -e s#^.*\d047\d047\d042\d047##g hosts
Should output everything after //*/ in each line...except it doesn't.
What is incorrect with that line?
Regarding deleting everything including and after the first / AFTER that first operation, yes, that's wanted too.
This might work for you (GNU sed):
sed -n 's#.*://\*/\([^/]\+\)/.*#\1#p' file
Match greedily (the longest string that matches) all characters up to ://*/, followed by a group of characters (which will be referred to as \1) that do not match a /, followed by the rest of the line and replace it by the group \1.
N.B. the sed substitution delimiters are arbitrary, in this case chosen to be # so as make pattern matching / easier. Also the character * on the left hand side of the substitution command may be interpreted as a meta character that means zero or more of the previous character/group and so is quoted \* so that it does not mistakenly exert this property. Finally, using the option -n toggles off the usual printing of every thing in the pattern space after all the sed commands have been executed. The p flag on the substitution command, prints the pattern space following a successful substitution, therefore only URL's will appear in the output or nothing.

SED - replace string newline anything with string newline varable

I have the following content in a file
dhcp_option_domain:
- test.domain
And what I need to do is this:
whenever the value 'dhcp_option_domain:' followed by a newline and then ANY string, replace it with 'dhcp_option_domain:' followed by a newline and a variable.
ie if I set a variable of dhcp_domain="different.com" then then string above would convert to:
dhcp_option_domain:
- different.com
Note that both lines have and need to maintain leading 2 spaces.
I do not want to just do a search and replace on 'test.domain' as I have a few cases to use this and the values could be different each time the sed command is run.
I have tried a few methods such as:
dhcp_domain="something.com"
sed -i 's|dhcp_option_domain:\n.*|dhcp_option_domain:\n - $dhcp_domain|g' filename
however cannot get it to work.
Thanks.
As the manual explains:
sed operates by performing the following cycle on each line of input: first, sed reads one line from the input stream, removes any trailing newline, and places it in the pattern space. Then commands are executed
Your regex (dhcp_option_domain:\n.*) does not match because there is no \n in the pattern space in the first place.
A possible solution:
sed '/dhcp_option_domain:$/{n;c\
- '"$dhcp_domain"'
}'
The /dhcp_option_domain:$/ part is an address. The following command is only executed on lines matching that pattern.
The { } command groups multiple commands into a single block.
The n command prints out the current pattern space and replaces it by the next line of input.
The c\ command replaces the current pattern space by whatever follows in the script. Here it gets a bit tricky. We have:
a literal newline in the sed program (required after c\), then
- (placing those characters in the pattern space literally, then
' (part of shell syntax, terminating the single-quoted part started by sed '...), then
" (starting a double-quoted part), then
$dhcp_domain (which, because it's in a double-quoted part, interpolates the contents of the dhcp_domain shell variable), then
" (terminating the double-quoted part), then
' (starting another single-quoted part), then
a literal newline again (terminating the text after c\), then
} (closing the block started by {).
By default, sed works line by line (using newline character to distinguish newlines)
$ cat ip.txt
foo baz
dhcp_option_domain:
- test.domain
123
dhcp_option_domain:
$ dhcp_domain='something.com'
$ sed '/^ dhcp_option_domain:/{n; s/.*/ - '"$dhcp_domain"'/}' ip.txt
foo baz
dhcp_option_domain:
- something.com
123
dhcp_option_domain:
/^ dhcp_option_domain:/ condition to match
{} to group more than one command to be executed when this condition is satisfied
n get next line
s/.*/ - '"$dhcp_domain"'/ replace it as required - note that shell variables won't be expanded inside single quotes, see sed substitution with bash variables
for details
note that last line in the file didn't trigger the change as there was no further line
tested on GNU sed, syntax might vary for other implementations
From GNU sed manual
n
If auto-print is not disabled, print the pattern space, then,
regardless, replace the pattern space with the next line of input. If
there is no more input then sed exits without processing any more
commands.
This might work for you (GNU sed):
sed '/dhcp_option_domain:$/{p;s// - '"${var}"'/;n;d}' file
Match on dhcp_option_domain:, print it, substitute the new domain name (maintaining indent), print the current line and fetch the next (n) and delete it.

Alternatives to grep/sed that treat new lines as just another character

Both grep and sed handle input line-by-line and, as far as I know, getting either of them to handle multiple lines isn't very straightforward. What I'm looking for is an alternative or alternatives to these two programs that treat newlines as just another character. Is there any tool that fits such a criteria
The tool you want is awk. It is record-oriented, not line-oriented, and you can specify your record-separator by setting the builtin variable RS. In particular, GNU awk lets you set RS to any regular expression, not just a single character.
Here is an example where awk uses one blank line to separate every record. If you show us what data you have, we can help you with it.
cat file
first line
second line
third line
fourth line
fifth line
sixth line
seventh line
eight line
more data
Running awk on this and reconstruct data using blank line as new record.
awk -v RS= '{$1=$1}1' file
first line second line third line
fourth line fifth line sixth line
seventh line eight line
more data
PS RS is not equal to file, is set to RS= blank, equal to RS=""
1) Sed can handle a block lines together, not always line by line.
In sed, normally I use :loop; $!{N; b loop}; to get all the lines available in pattern space delimited by newline.
Sample:
Productivity
Google Search\
Tips
"Web Based Time Tracking,
Web Based Todo list and
Reduce Key Stores etc"
result (remove the content between ")
sed -e ':loop; $!{N; b loop}; s/\"[^\"]*\"//g' thegeekstuff.txt
Productivity
Google Search\
Tips
You should read this URL (Unix Sed Tutorial: 6 Examples for Sed Branching Operation), it will give you detail how it works.
http://www.thegeekstuff.com/2009/12/unix-sed-tutorial-6-examples-for-sed-branching-operation/
2) For grep, check if your grep support -z option, which needn't handle input line by line.
-z, --null-data
Treat the input as a set of lines, each terminated by a zero
byte (the ASCII NUL character) instead of a newline. Like the
-Z or --null option, this option can be used with commands like
sort -z to process arbitrary file names.

Confining Substitution to Match Space Using sed?

Is there a way to substitute only within the match space using sed?
I.e. given the following line, is there a way to substitute only the "." chars that are contained within the matching single quotes and protect the "." chars that are not enclosed by single quotes?
Input:
'ECJ-4YF1H10.6Z' ! 'CAP' ! '10.0uF' ! 'TOL' ; MGCDC1008.S1 MGCDC1009.A2
Desired result:
'ECJ-4YF1H10-6Z' ! 'CAP' ! '10_0uF' ! 'TOL' ; MGCDC1008.S1 MGCDC1009.A2
Or is this just a job to which perl or awk might be better suited?
Thanks for your help,
Mark
Give the following a try which uses the divide-and-conquer technique:
sed "s/\('[^']*'\)/\n&\n/g;s/\(\n'[^.]*\)\.\([^']*Z'\)/\1-\2/g;s/\(\n'[^.]*\)\.\([^']*uF'\)/\1_\2/g;s/\n//g" inputfile
Explanation:
s/\('[^']*'\)/\n&\n/g - Add newlines before and after each pair of single quotes with their contents
s/\(\n'[^.]*\)\.\([^']*Z'\)/\1-\2/g - Using a newline and the single quotes to key on, replace the dot with a dash for strings that end in "Z"
s/\(\n'[^.]*\)\.\([^']*uF'\)/\1_\2/g - Using a newline and the single quotes to key on, replace the dot with a dash for strings that end in "uF"
s/\n//g - Remove the newlines added in the first step
You can restrict the command to acting only on certain lines:
sed "/foo/{s/\('[^']*'\)/\n&\n/g;s/\(\n'[^.]*\)\.\([^']*Z'\)/\1-\2/g;s/\(\n'[^.]*\)\.\([^']*uF'\)/\1_\2/g;s/\n//g}" inputfile
where you would substitute some regex in place of "foo".
Some versions of sed like to be spoon fed (instead of semicolons between commands, use -e):
sed -e "/foo/{s/\('[^']*'\)/\n&\n/g" -e "s/\(\n'[^.]*\)\.\([^']*Z'\)/\1-\2/g" -e "s/\(\n'[^.]*\)\.\([^']*uF'\)/\1_\2/g" -e "s/\n//g}" inputfile
$ cat phoo1234567_sedFix.sed
#! /bin/sed -f
/'[0-9][0-9]\.[0-9][a-zA-Z][a-zA-Z]'/s/'\([0-9][0-9]\)\.\([0-9][a-zA-Z][a-zA-Z]\)'/\1_\2/
This answers your specific question. If the pattern you need to fix isn't always like the example you provided, they you'll need multiple copies of this line, with reg-expressions modified to match your new change targets.
Note that the cmd is in 2 parts, "/'[0-9][0-9].[0-9][a-zA-Z][a-zA-Z]'/" says, must match lines with this pattern, while the trailing "s/'([0-9][0-9]).([0-9][a-zA-Z][a-zA-Z])'/\1_\2/", is the part that does the substitution. You can add a 'g' after the final '/' to make this substitution happen on all instances of this pattern in each line.
The \(\) pairs in match pattern get converted into the numbered buffers on the substitution side of the command (i.e. \1 \2). This is what gives sed power that awk doesn't have.
If your going to do much of this kind of work, I highly recommend O'Rielly's Sed And Awk book. The time spent going thru how sed works will be paid back many times.
I hope this helps.
P.S. as you appear to be a new user, if you get an answer that helps you please remember to mark it as accepted, or give it a + (or -) as a useful answer.
this is a job most suitable for awk or any language that supports breaking/splitting strings.
IMO, using sed for this task, which is regex based , while doable, is difficult to read and debug, hence not the most appropriate tool for the job. No offense to sed fanatics.
awk '{
for(i=1;i<=NF;i++) {
if ($i ~ /\047/ ){
gsub(".","_",$i)
}
}
}1' file
The above says for each field (field seperator by default is white space), check to see if there is a single quote, and if there is , substitute the "." to "_". This method is simple and doesn't need complicated regex.