Sed inside a while read loop

Sed inside a while read loop - sed

I have been reading a lot of questions and answers about using sed within a while loop. I think I have the command down correctly, but I seem to get no output once I put all of the pieces together. Can someone tell me what I am missing?
I have an input file with 700 variables, one on each line. I need to use each of these 700 variables within a sed command. I run the following command to verify variables are outputting correctly:
cat Input_File.txt | while read var; do echo $var; done
I then try to add in the sed command as follows:
cat Input_File.txt | while read var; do sed -n "/$var/,+10p" Multi-BLAST_5814.txt >> Multi_BLAST_Subset; done
This command leaves me without an error, but a blinking cursor as if this is an infinite loop. It should use each of the 700 variables, find the corresponding line in Multi_BLAST_5814.txt and output the search variable line and the 10 lines after the search term into a new file, appending each as it goes. I can execute the sed command alone with a manually set single value variable successfully and I can execute the while loop successfully using the input file. Anyone have a thought as to why this is not working?
User, that is exactly what I have done to this point.
I have a large text file (128 MB) with BLAST output. I need to search through this for a subset of results for 769 samples (Out of the 5814 samples that are in the file).
I have created a .txt file with those 769 sample names.
To test grep and sed, I manually assigned a variable with one of the 769 samples names I need to search and can get the results I need as follows:
$ Otu="S41_Folmer_Otu96;size=12;"
$ grep $Otu -A 10 Multi_BLAST_5814.txt
OR
$ sed -n "/$Otu/,+10p" Multi_BLAST_5814.txt
The OUTPUT is exactly what I want as follows:
Query= S41_Folmer_Otu96;size=12;
Length=101
Sequences producing significant alignments: Score(Bits) E Value
gi|58397553|gb|AY830431.1| Scopelocheirus schellenbergi clone... 180 1E-41
gi|306447543|gb|HQ018876.1| Liposcelis paeta isolate CZ cytoc... 174 6E-40
gi|306447533|gb|HQ018871.1| Liposcelis decolor isolate CQ cyt... 104 9E-19
gi|1043259532|gb|KX130860.1| Batocera rufomaculata isolate Br... 99 4E-17
gi|987210821|gb|KR141076.1| Psocoptera sp. BOLD:ACO1391 vouch... 81 1E-11
To Test to make sure the input file contains the correct variables I run the following:
$ Cat Input_File.txt
$ while read Otu; do echo $Otu; done <Input_File.txt
S41_Folmer_Otu96;size=12;
S78_Folmer_Otu15;size=538;
S73_Leray_Otu52;size=6;
S66_Leray_Otu93;size=6;
S10_Folmer_Otu10;size=1612;
... All 769 variables
Again, this is exactly what I expect and is correct.
But, When I do either of the following commands, nothing is printed to the screen (if I leave off the write file/append action) or to the file I need to create.
$ cat Input_File.txt | while read Otu; do grep "$Otu" -A 10 Multi_BLAST_5814.txt >> Multi_BLAST_Subset.txt; done
$ cat Input_File.txt | while read Otu; do sed -n "/$Otu/,+10p" Multi_BLAST_5814.txt >> Multi_BLAST_Subset.txt; done
Sed hangs and never closes, leaving me at a blinking cursor. Grep finishes but also gives no output. I am at a loss as to why this is not working. Everything works inidividually, so I may be left with manually searching all 769 samples copy/paste.

If you have access to GNU grep no need for a sed command, grep "$var" -A 10 will do the same thing and won't break if $var contains the delimiter used in your sed command.
From man grep :
-A NUM, --after-context=NUM
Print NUM lines of trailing context after matching lines.
Places a line containing a group separator (--) between
contiguous groups of matches. With the -o or --only-matching
option, this has no effect and a warning is given.

Not sure whether you have already attempted it but try breaking the problem into smaller chunks. Simple example below :
$ cat Input_File.txt
one
two
three
$
$ cat file.txt
This is line one
This is line two
This is line three
This is another four
This is another five
This is another six
This is another seven
$
$ cat Input_File.txt | while read var ; do echo $var ; sed -n "/$var/,+1p" file.txt ; done
one
This is line one
This is line two
two
This is line two
This is line three
three
This is line three
This is another four
$

Related

How to find only the first and last line of a file using sed

I have a file called error_log for the apache and I want to see the first line and the last line of this file using sed command. Would you please help me how can I do that?
I know how to do that with head and tail commands, but I'm curious if it's possible in sed command too.
I have read the man sed and have googled a lot but nothing is found unfortunately.

This might work for you (GNU sed):
sed '1b;$b;d' file
All sed commands can be prefixed by either an address or a regexp. An address is either a line number or the $ which represents the last line. If neither an address or a regexp is present, the following command applies to all other lines.
The normal sed cycle, presents each line of input (less its newline) in the pattern space. The sed commands are then applied and the final act of the cycle is to re-attach the newline and print the result.
The b command controls command flow; if by itself it jumps out of the following sed commands to the final act of the cycle i.e. where the newline is re-attached and the result printed.
The d command deletes the pattern space and since there is nothing to be printed no further processing is executed (including re-attaching the newline and printing the result).
Thus the solution above prints the first line and the last and deletes the rest.
Sed has some command line options, one of which turns of the implicit printing of the result of the pattern space -n. The p command prints the current state of the pattern space. Thus the dual of the above solution is:
sed -n '1p;$p' file
N.B. If the input file is only one line the first solution will only print one line whereas the second solution will print the same line twice. Also if more than one file is input both solutions will print the first line of the first file and last line of the last file unless the -i option is in place, in which case each file will be amended. The -s option replicates this without amending each file but streams the results to stdout as if each file is treated separately.

This will work:
sed -n '1p ; $p' error_log
1p will print the first line and $p will print the last line.
As a suggestion, take a look at info sed, not only man sed. You can find the some examples about your question at the paragraph 2.1.

First line:
sed '2,$d' error_log
Last line:
sed '$!d' error_log

Based on your new requirement to output nothing if the input file is just 1 line (see How to find only the first and last line of a file using sed):
awk 'NR==1{first=$0} {last=$0} END{if (NR>1) print first ORS last}'
Original answer:
This is one of those things that you can, at face value, do easily enough in sed:
$ seq 3 7
3
4
5
6
7
$ seq 3 7 | sed -n '1p; $p'
3
7
but then how to handle edge cases like one line of input is non-obvious, e.g. is this REALLY the correct output:
$ printf 'foo\n' | sed -n '1p; $p'
foo
foo
or is the correct output just:
foo
and if the latter, how do you tweak that sed command to produce that output? #potong suggested a GNU sed command:
$ printf 'foo\n' | sed '1b;$b;d'
foo
which works but may be GNU-only (idk) and more importantly doesn't look much like the command we started with so the tiniest change in requirements meant a complete rewrite using different constructs.
Now, how about if you want to enhance it to, say, only print the first and last line if the file contained foo? I expect that'd be another challenging exercise with sed and probably involve non-portable constructs too.
It's just all pointless to learn how to do this with sed when you can use a different tool like awk and do whatever you like in a simple, consistent, portable syntax:
$ seq 3 7 |
awk 'NR==1{first=$0} {last=$0} END{print first ORS last}'
3
7
$ printf 'foo\n' |
awk 'NR==1{first=$0} {last=$0} END{print first ORS last}'
foo
foo
$ printf 'foo\n' |
awk 'NR==1{first=$0} {last=$0} END{print first (NR>1 ? ORS last : "")}'
foo
$ printf '3\nfoo\n7\n' |
awk 'NR==1{first=$0} /foo/{f=1} {last=$0} END{if (f) print first (NR>1 ? ORS last : "")}'
3
7
$ printf '3\nbar\n7\n' |
awk 'NR==1{first=$0} /foo/{f=1} {last=$0} END{if (f) print first (NR>1 ? ORS last : "")}'
$
Notice that:
Every command looks like every other command.
A minor change in requirements leads to a minor change in the code, not a complete rewrite.
Once you learn how to do any given thing A, how to do similar things B, C, D, etc. just builds on top of the syntax you already used, you don't have to learn a completely different syntax.
Each of those commands will work using any awk in any shell on every UNIX box.
Now, how about if you want to do that for multiple files such as would be created by the following commands?
$ seq 3 7 > file1
$ seq 12 25 > file2
With awk you can just store the lines in an array for printing in the END:
$ awk 'FNR==1{first[++cnt]=$0} {last[cnt]=$0}
END{for (i=1;i<=cnt;i++) print first[i] ORS last[i]}' file1 file2
3
7
12
25
or with GNU awk you can print them from ENDFILE:
$ awk 'FNR==1{first=$0} {last=$0} ENDFILE{print first ORS last}' file1 file2
3
7
12
25
With sed? An exercise left for the reader.

sed, xargs and stdbuf - how to get only first n matches of a pattern from a file

I have a file with patterns (1 line=1 pattern) I want to look for on a big text file - only one (or none) pattern will be found in each line of the infile. Once found a match, I want to retrieve the characters immediately before the match. The first part is to acquire the patterns for sed
cat patterns.txt | xargs -I '{}' sed -n 's/{}.*$//p' bigtext.txt
That works ok - the downside being that potentially I'll have hundreds of thousands of matches. I don't want/need all the matches - a fair representation of 1K hits would be enough. And here is where I struggle: I've read that in order to limit the number of hits of sed, I should use stdbuf (gstdbuf in my case) and pipe the whole thing through head. But I am not sure where to place the stdbuf command:
cat patterns.txt | xargs -I '{}' gstdbuf -oL -eL sed -n 's/{}.*$//p' bigtext.txt | head -n100
When I tried this, the process takes as long as if it was running sed on the whole file and then getting the head of that output, while my wish is to stop searching after 100 or 1000 matches. Any ideas on the best way of accomplishing this?

Is the oneliner you have provided really what you wanted? Esp. since you mention a fair sample. Because as it is stands right now, it feeds patterns.txt into xargs... which will go ahead and invoke sed for each pattern individually, one after another. And the whole output of xargs is fed to head which chops it of after n lines. In other words, your first pattern can already exhaust all the lines you wanted to see, even though the other patterns could have matched any number of times on lines occurring before the matches presented to you. Detailed example between horizontal rulers.
If I have patterns.txt of:
_Pat1
_Pat2
_Pat3
And bigtext.txt with:
1matchx_Pat1x
2matchx_Pat2x
2matchy_Pat2y
2matchz_Pat2z
3matchx_Pat3x
3matchy_Pat3y
3matchz_Pat3z
1matchy_Pat1y
1matchz_Pat1z
And I run your oneliner limited to five hits, I do not get result of (first five matches for all three patterns as found in the file):
1matchx
2matchx
2matchy
2matchz
3matchx
But (all (3) patches for _Pat1 plus 2 matches for _Pat2 after which I've ran out of output lines):
1matchx
1matchy
1matchz
2matchx
2matchy
Now to your performance problem which is partially related. I have to admit that I could not reproduce it. I've taken your example from the comment, blew the "big" file up to a 1GB in size by repeating the pattern and ran your oneliner:
$ time { cat patterns.txt | xargs -I '{}' stdbuf -oL sed -n 's/{}.*$//p' bigtext.txt | head -5 ; }
1aaa
2aaabbb
3aaaccc
1aaa
2aaabbb
xargs: stdbuf: terminated by signal 13
real 0m0.012s
user 0m0.013s
sys 0m0.008s
Note I've dropped the -eL, stderr is usually unbuffered (which is what you usually want) and doesn't play any role here really. Also note I've ran stdbuf without the "g" prefix, which tells me you're probably on a system where GNU tools are not the default... and probably the reasons why you get different behavior. I'll try to explain what is going on and venture few guesses... and conclude with a suggestion. Also note, I really did not need to use stdbuf (manipulate buffering) at all or rather it had no appreciable impact on the result, but again, this could be platform and tools (as well as scenario) specific.
When you read the line from its end, head reads standard input as it is being piped in from xargs (and by extension the sed (or stdbuf wrapping) runs which xargs forks, they are both attached to its writing end) until limit of lines to print has been reached and then head terminates. Doing so "breaks" the pipe and xargs and sed (or stdbuf which it was wrapped in) receive SIGPIPE signal and by default they as well terminate (that you can see in the output of my run: xargs: stdbuf: terminated by signal 13).
What the stdbuf -oL does and why someone might have suggested it. When no longer using console for reading/writing, which would usually be line buffered, and using pipes we would usually see buffered I/O instead. stdbuf -oL changes that back to line buffered. Without it, the process involved would communicate in larger chunk and it could take head longer to realize, it is done and needs no further input, while sed keeps running to see if there are any further suitable matches. As mentioned, on my systems (4K buffer) and with that (repeating pattern) example, this made no real difference. Also note, while it decreases the risk of not knowing we could be done, line buffering does increase overhead involved in communication between the processes.
So why would these mechanics not yield the same expected results for you? Couple options come to mind:
since you fork and run sed once per pattern, whole file each time. It could happen you get series of several runs without any hits. I'd guess this is actually likely the case.
since you give sed file to read from, you may have different implementation of sed that tries to read a lot more in before taking action on the file content (mine reads 4K at a time). Not a likely cause, but in theory you could also feed sed line by line to force smaller chunks and getting that SIGPIPE sooner.
Now assuming that sequential pattern by pattern matching is actually not desired, summary of all of above would be: process your patterns first into a single one and then perform a single pass over the "big" file (optionally capping the output of course). It might be worth switching from shell mostly to something a bit more comfortable to use, or at least not to keep the oneliner format which is likely to turn confusing.
Not true to my own recommendation, awk script called like this prints first 5 hits and quits:
awk -v patts="$(cat patterns.txt)" -v last=5 'BEGIN{patts="(" patts ; gsub(/\n/, "|", patts) ; sub(/.$/, ")", patts); cnt=1 ;} $0~patts{sub(patts ".*", ""); print; cnt++;} cnt>last{exit;}' bigtext.txt

You can specify a file that has patterns to match to the grep command with a -f file. You can also specify the number of matches to find before quiting -m count
So this command will get you the first 5 lines that match:
grep -f patterns.txt -m 5 bigtext.txt
Now to trim the match to the end of the line, is a bit more difficult.
Assuming you use bash, we can build a regex from the file, like this:
while IFS='' read -r line || [[ -n "$line" ]]; do
subRegex="s/$line.*//;"${subRegex}
done < patterns.txt
Then use this in a sed command. The resulting code becomes:
while IFS='' read -r line || [[ -n "$line" ]]; do
subRegex="s/$line.*//;"${subRegex}
done < patterns.txt
grep -f patterns.txt -m 5 bigtext.txt | sed "$subRegex"
The sed command is only running on the lines that have already matched from the grep, so it should be fairly performant.
Now if you call this a lot you could put it in a function
function findMatches() {
local matchCount=${1:-5} # default to 5 matches
local subRegex
while IFS='' read -r line || [[ -n "$line" ]]; do
subRegex="s/$line.*//;"${subRegex}
done < patterns.txt
grep -f patterns.txt -m ${matchCount} bigtext.txt | sed "${subRegex}"
}
Then you can call it like this
findMatches 5
findMatches 100
Update
Based on the sample files you gave, this solution does produce the expected result 1aaa 2aaabbb 3aaaccc 4aaa 5aaa
However, given your comment on the length of each pattern being 120 characters, and each line of the bigfile being 250 characters, 10 GB file size.
You didn't mention how many patterns you might have. So I tested and it seems that the sed command done inline falls apart someplace before 50 patterns.
(Of course, if your samples are really how the data look, then you could do your trimming of each line to be based bases on non-AGCT and not based on the patterns file. Which would be much quicker)
But based on the original question. You can generate a sed script in a separate file based on patterns.txt. Like this:
sed -e "s/^/s\//g;s/$/.*\$\/\/g/g;" patterns.txt > temp.sed
then use this temp file on the sed command.
grep -f patterns.txt -m 5 bigtext.txt | sed -f temp.sed
The grep stops after finding X matches, and the sed trims those... The new function runs on my machine in a couple seconds.
For testing I created a 2GB file of 250 character AGCT combos. And another file with 50+ patterns, 120 characters each with a few of these patterns taken from random lines of the bigtext file.
function findMatches() {
sed -e "s/^/s\//g;s/$/.*\$\/\/g/g;" patterns.txt > temp.sed
grep -f patterns.txt -m ${1:-5} bigtext.txt | sed -f temp.sed
}

Multiple lines from one with sed

Is there some way in sed to create multiple output lines from a single input line? I have a template file (there are more lines in the file, I'm just simplifying it):
http://hostname:#PORT#
I am currently using sed to replace #PORT# with a real port. However, I'd like to be able to pass in multiple ports, and have sed create a line for each. Is that possible?

I'm assuming you would want to duplicate the whole line for each port number. In that case it's easier to think of it as replacing the port numbers with the URL:
$ cat ports.in
1
2
3
4
5
$ sed 's#^\([0-9]*\$)#http://hostname:\1#' ports.in
http://hostname:1
http://hostname:2
http://hostname:3
http://hostname:4
http://hostname:5
To do it the other way around is easier with awk:
$ cat url.in
http://hostname:#PORT#
$ awk '/^[0-9]/ {ports[++i]=$0} /^http/ {sub(":#PORT#", ":%d\n"); for (p in ports) printf($0, ports[p])}' ports.in url.in
http://hostname:2
http://hostname:3
http://hostname:4
http://hostname:5
http://hostname:1
This reads both ports.in and url.in, and if a line starts with a number it is assumed that it's a port number from ports.in. Otherwise, if the line starts with http it's assumed to be an URL from url.in and will replace the port placeholder with a printf formatting string and then print the URL once for each port number read. It will fail to do the right thing if the files are fed in the wrong order.
A similar solution, but taking the URL from a shell variable:
$ myurl="http://hostname:#PORT#"
$ awk -v url="$myurl" 'BEGIN{sub(":#PORT#", ":%d\n",url)} /^[0-9]/ {ports[++i]=$0} END {for (p in ports) printf(url, ports[p])}' ports.in
http://hostname:2
http://hostname:3
http://hostname:4
http://hostname:5
http://hostname:1

It seems you have multiple templates and multiple ports to apply to them. Here's how to do it in a shell script (tested with bash), but you'll need to do it in two sed executions if you want to keep it simple because you have two multiply valued inputs. It is mathematically a cross product of the templates and the substitution values.
ports='80
8080
8081'
templates='http://domain1.net:%PORT/
http://domain2.org:%PORT/
http://domain3.com:%PORT/'
meta="s/(.*)/g; s|%PORT|\1|p; /p"
sed="`echo \"$ports\" |sed -rn \"$meta\" |tr '\n' ' '`"
echo "$templates" |sed -rn "h; $sed"
The shell variable meta is a meta sed script because it writes another sed script. The h saves the pattern buffer in the sed hold space. The sed commands generated from the meta sed recall, substitute, and print for each port. This is the result.
http://domain1.net:80/
http://domain1.net:8080/
http://domain1.net:8081/
http://domain2.org:80/
http://domain2.org:8080/
http://domain2.org:8081/
http://domain3.com:80/
http://domain3.com:8080/
http://domain3.com:8081/

Extracting the contents between two different strings using bash or perl

I have tried to scan through the other posts in stack overflow for this, but couldn't get my code work, hence I am posting a new question.
Below is the content of file temp.
<?xml version="1.0" encoding="UTF-8"?>
<env:Envelope xmlns:env="http://schemas.xmlsoap.org/soap/envelope/<env:Body><dp:response xmlns:dp="http://www.datapower.com/schemas/management"><dp:timestamp>2015-01-
22T13:38:04Z</dp:timestamp><dp:file name="temporary://test.txt">XJzLXJlc3VsdHMtYWN0aW9uX18i</dp:file><dp:file name="temporary://test1.txt">lc3VsdHMtYWN0aW9uX18i</dp:file></dp:response></env:Body></env:Envelope>
This file contains the base64 encoded contents of two files names test.txt and test1.txt. I want to extract the base64 encoded content of each file to seperate files test.txt and text1.txt respectively.
To achieve this, I have to remove the xml tags around the base64 contents. I am trying below commands to achieve this. However, it is not working as expected.
sed -n '/test.txt"\>/,/\<\/dp:file\>/p' temp | perl -p -e 's#<dp:file name="temporary://test.txt">##g'|perl -p -e 's#</dp:file>##g' > test.txt
sed -n '/test1.txt"\>/,/\<\/dp:file\>/p' temp | perl -p -e 's#<dp:file name="temporary://test1.txt">##g'|perl -p -e 's#</dp:file></dp:response></env:Body></env:Envelope>##g' > test1.txt
Below command:
sed -n '/test.txt"\>/,/\<\/dp:file\>/p' temp | perl -p -e 's#<dp:file name="temporary://test.txt">##g'|perl -p -e 's#</dp:file>##g'
produces output:
XJzLXJlc3VsdHMtYWN0aW9uX18i
<dp:file name="temporary://test1.txt">lc3VsdHMtYWN0aW9uX18i</dp:response> </env:Body></env:Envelope>`
Howeveer, in the output I am expecting only first line XJzLXJlc3VsdHMtYWN0aW9uX18i. Where I am commiting mistake?
When i run below command, I am getting expected output:
sed -n '/test1.txt"\>/,/\<\/dp:file\>/p' temp | perl -p -e 's#<dp:file name="temporary://test1.txt">##g'|perl -p -e 's#</dp:file></dp:response></env:Body></env:Envelope>##g'
It produces below string
lc3VsdHMtYWN0aW9uX18i
I can then easily route this to test1.txt file.
UPDATE
I have edited the question by updating the source file content. The source file doesn't contain any newline character. The current solution will not work in that case, I have tried it and failed. wc -l temp must output to 1.
OS: solaris 10
Shell: bash

sed -n 's_<dp:file name="\([^"]*\)">\([^<]*\).*_\1 -> \2_p' temp
I add \1 -> to show link from file name to content but for content only, just remove this part
posix version so on GNU sed use --posix
assuming that base64 encoded contents is on the same line as the tag around (and not spread on several lines, that need some modification in this case)
Thanks to JID for full explaination below
How it works
sed -n
The -n means no printing so unless explicitly told to print, then there will be no output from sed
's_
This is to substitute the following regex using _ to separate regex from the replacement.
<dp:file name=
Regular text
"\([^"]*\)"
The brackets are a capture group and must be escaped unless the -r option is used( -r is not available on posix). Everything inside the brackets is captured. [^"]* means 0 or more occurrences of any character that is not a quote. So really this just captures anything between the two quotes.
>\([^<]*\)<
Again uses the capture group this time to capture everything between the > and <
.*
Everything else on the line
_\1 -> \2
This is the replacement, so replace everything in the regex before with the first capture group then a -> and then the second capture group.
_p
Means print the line
Resources
http://unixhelp.ed.ac.uk/CGI/man-cgi?sed
http://www.grymoire.com/Unix/Sed.html

/usr/xpg4/bin/sed works well here.
/usr/bin/sed is not working as expected in case if the file contains just 1 line.
below command works for a file containing only single line.
/usr/xpg4/bin/sed -n 's_<env:Envelope\(.*\)<dp:file name="temporary://BackUpDir/backupmanifest.xml">\([^>]*\)</dp:file>\(.*\)_\2_p' securebackup.xml 2>/dev/null
Without 2>/dev/null this sed command outputs the warning sed: Missing newline at end of file.
This because of the below reason:
Solaris default sed ignores the last line not to break existing scripts because a line was required to be terminated by a new line in the original Unix implementation.
GNU sed has a more relaxed behavior and the POSIX implementation accept the fact but outputs a warning.

the 'd' command in the sed utility

From the sed documentation:
d Delete the pattern space; immediately start next cycle.
What does it mean by next cycle? My understanding is that sed will not apply the following commands after the d command and it starts to read the next line from the input stream and processes it. But it seems that this is not true. See this example:
[root#localhost ~]# cat -A test.txt
aaaaaaaaaaaaaa$
$
bbbbbbbbbbbbb$
$
$
ccccccccc$
ddd$
$
eeeeeee$
[root#localhost ~]# cat test.txt | sed '/^$/d;p;p'
aaaaaaaaaaaaaa
aaaaaaaaaaaaaa
aaaaaaaaaaaaaa
bbbbbbbbbbbbb
bbbbbbbbbbbbb
bbbbbbbbbbbbb
ccccccccc
ccccccccc
ccccccccc
ddd
ddd
ddd
eeeeeee
eeeeeee
eeeeeee
[root#localhost ~]#
If immediately start next cycle, the p command will not have any output.
Anyone can help me to explain it please? Thanks.

It means that sed will read the next line and start processing it.
Your test script doesn't do what you think. It matches the empty lines and applies the delete command to them. They don't appear, so the print statements don't get applied to the empty lines. The two print commands aren't connected to the pattern for the delete command, so the non-empty lines are printed three times. If you instead try
sed '/./d;p;p' test.txt # matches all non-empty lines
nothing will be printed other than the blank lines, three times each.

a) You can combine multiple commands for one pattern with curly braces:
sed '/^$/{d;p;p}' test.txt
aaaaaaaaaaaaaa
bbbbbbbbbbbbb
ccccccccc
ddd
eeeeeee
The command d is only applied to empty lines here: '/^$/d;p;p'. Else the line is printed 2 additional times. To bind the 'p'-command to the pattern, you have to build curly braces. Then the p-command is as well skipped, but because of the skipping to next cycle, not because it doesn't match.
b) Useless use of cat. (already shown)

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse