removing just the middle of the file with sed

removing just the middle of the file with sed - sed

I want to use sed as a part of the pipeline to preserve just 10 first and 10 last lines of its input. It would not be working on physical files, but just reading from STDIN and outputting to STDOUT. The amount of data in stream is bigger than machine RAM (or its disk space), so it needs to relatively efficient. It also must work in stream mode, without creating temporary files (no writeable filesystems).
Extra bonus if it could display one line instead of all of the middle it deleted:
for example, if I had input lines containing numbers from 1 to 100000, I would need it to output (line with literal <cut> text would be nice, but is optional):
1
2
3
4
5
6
7
8
9
10
<cut>
99991
99992
99993
99994
99995
99996
99997
99998
99999
100000
The best I've come up with is to have it output first 10 lines, and last one 1 line with:
yes ' ' | head -n 100000 |nl | \
sed -e '$q;11,$d'`
which outputs
1
2
3
4
5
6
7
8
9
10
100000
but I need it to output more context (10 lines instead of just 1) at the end of data too.
Update: length of the input stream is unknown and will vary, 100000 above is just an example.
Update: as noted in the question and the tag, I need it in sed, not awk, perl or other programming languages in which it is more easy to accomplish (that requirement, along with no tmp files, is due to fact it is embedded system with limited commands and resources available)
Update: if the input is less then that 10+10 lines, it should ideally just print the whole input

You can try following command:
sed -n 'H; 1,10 { p; b }; g; s/\n[^\n]*//; h; $ { s/\n/<cut>\n/; p }'
sed has two blocks to save content, pattern space and hold space. The first one is used to parse current line, and the second one can be used as a backup. The approach is to save in hold space the last ten lines processed.
H saves every line to hold space, g recover hold space, then remove oldest line and save again to hold space, and in last line ($) print adding your magic word in front of it.
The whole command:
yes ' ' | head -n 100000 |nl|\
sed -n 'H; 1,10 { p; b }; g; s/\n[^\n]*//; h; $ { s/\n/<cut>\n/; p }'
Yields:
1
2
3
4
5
6
7
8
9
10
<cut>
99991
99992
99993
99994
99995
99996
99997
99998
99999
100000
And said that, follow the advice from Ed Morton, because awk is simpler and easier to debug or modify some weeks later.
UPDATE:
You can append to hold space after first ten lines and check if there are more than 10 newline characters in it before removing oldest as FIFO structure:
sed -n '1,10 { p; b }; H; g; /\(\n[^\n]\+\)\{11\}/ s/\n[^\n]*//; h; $ { s/^\n//; p }'
Now it's more challenging to know where to add the <cut> string in the edge case of 20 input lines, but I will leave it as an exercise for you.

sed is for simple substitutions on a single line, that is all. For anything else, including this task, you should be using awk:
$ cat tst.awk
BEGIN { beg=(beg?beg:3); end=(end?end:3) }
NR<=beg
{ rec[(NR-1)%end+1] = $0 }
END {
print "<cut>"
for (i=1;i<=end;i++) {
print rec[(NR+i-1)%end+1]
}
}
$ seq 10 | awk -f tst.awk
1
2
3
<cut>
8
9
10
$ seq 10 | awk -v beg=2 -v end=4 -f tst.awk
1
2
<cut>
7
8
9
10
I see you've added a "it has to be sed" requirement to your question but I'll leave this answer here for future readers looking for a sensible way to perform the task.

This might work for you (GNU sed):
sed '1,10b;:a;$!{N;s/\n/&/10;Ta;D};i\<cut>' file
Print the first 10 lines as normal. Collect the next 11 lines and if it is not the end of file, delete the first of them and repeat always maintaining the last 10 lines. At the end of the file, insert a line containing <cut> and print the remaining 10 lines.

Related

Join certain lines with sed

I have an input which looks like this:
1
2
3
4
5
6
And I want to transform it with sed to :
12
345
6
I know it can be easily done with other tools but I want to do it specifically with sed as a learning exercise.
I have attempted this:
sed ':x ; /^ *$/{ N; s/\n// ; bx; }'
But it prints :
123456
Can someone help me fix this?

Quoting from the GNU sed manual:
A common technique to process blocks of text such as paragraphs (instead of line-by-line) is using the following construct:
sed '/./{H;$!d} ; x ; s/REGEXP/REPLACEMENT/'
The first expression, /./{H;$!d} operates on all non-empty lines, and adds the current line (in the pattern space) to the hold space. On all lines except the last, the pattern space is deleted and the cycle is restarted.
The other expressions x and s are executed only on empty lines (i.e. paragraph separators). The x command fetches the accumulated lines from the hold space back to the pattern space. The s/// command then operates on all the text in the paragraph (including the embedded newlines).
And indeed,
sed '/./{H;$!d} ; x ; s/\n//g'
does what you want.

FWIW here's how to really do that task in UNIX:
$ awk -v RS= -v OFS= '{$1=$1}1' file
12
345
6
The above will work on any UNIX box.

A GNU awk approach:
$ awk -F"\n" '{gsub("\n","");}1' RS='\n{2,}' file
12
345
6
Note it will add a trailing newline\n after last line.

Insert filename into text file with sed

I've been learning about sed and finding it very useful, but cannot find an answer to this in any of the many guides and examples ... I'd like to insert the filename of a text file, minus its path and extension, into a specific line within the text itself. Possible?

In such cases, the correct starting point should be man pages. Manual of sed does not provide a feature for sed to understand "filename", but sed does support inserting a text before/after a line.
As a result you need to isolate the filename separatelly , store the text to a variable and inject this text after/before the line you wish.
Example:
$ a="/home/gv/Desktop/PythonTests/cpu.sh"
$ a="${a##*/}";echo "$a"
cpu.sh
$ a="${a%.*}"; echo "$a"
cpu
$ cat file1
LOCATION 0 X 0
VALUE 1a 2 3
VALUE 1b 2 3
VALUE 1c 2 3
$ sed "2a $a" file1 # Inject the contents of variable $a after line2
LOCATION 0 X 0
VALUE 1a 2 3
cpu
VALUE 1b 2 3
VALUE 1c 2 3
$ sed "2i $a" file1 # Inject the contetns of variable $a before line2
LOCATION 0 X 0
cpu
VALUE 1a 2 3
VALUE 1b 2 3
VALUE 1c 2 3
$ sed "2a George" file1 #Inject a fixed string "George" after line 2
LOCATION 0 X 0
VALUE 1a 2 3
George
VALUE 1b 2 3
VALUE 1c 2 3
Explanation:
a="${a##*/}" : Removes all chars from the beginning of string up to last found slash / (longer match)
a="${a%.*}" : Remove all chars starting from the end of the string up to the first found dot . (short match) . You can also use %% for the longest found dot.
sed "2a $a" : Insert after line 2 the contents of variable $a
sed "2i $q" : Insert before line 2 the contents of $a
Optionally you can use sed -i to make changes in-place / in file under process

wrt I've been learning about sed then you may have been wasting your time as there isn't a lot TO learn about sed beyond s/old/new. Sure there's a ton of other language constructs and things you could do with sed, but in practice you should avoid them all and simply use awk instead. If you edit your question to include concise, testable sample input and expected output and add an awk tag then we can show you how to do whatever you want to do the right way.
Meanwhile, it sounds like all you need is:
$ cat /usr/tmp/file
a
b
c
d
e
$ awk 'NR==3{print gensub(/.*\//,"",1,FILENAME)} 1' /usr/tmp/file
a
b
file
c
d
e
The above inserts the current file name before line 3 of the open file. It uses GNU awk for gensub(), with other awks you'd just use sub() and a variable.

How to remove empty lines to one empty line between sentences in text files?

I have a text file with many empty lines between sentences. I used sed, gawk, grep but they dont work. :(. How can I do now? Thanks.
Myfile: Desired file:
a a
b b
c c
. .
d d
e e
f f
g g
. .
h
i
h j
i k
j .
k
.

You can use awk for this:
awk 'BEGIN{prev="x"}
/^$/ {if (prev==""){next}}
{prev=$0;print}' inputFile
or the compressed one liner:
awk 'BEGIN{p="x"}/^$/{if(p==""){next}}{p=$0;print}' inFl
This is a simple state machine that collapses multi-blank-lines into a single one.
The basic idea is this. First, set the previous line to be non-empty.
Then, for every line in the file, if it and the previous one are blank, just throw it away.
Otherwise, set the previous line to that value, print the line, and carry on.
Sample transcript, the following command:
$ echo '1
2
3
4
5
6
7
8
9
10' | awk 'BEGIN{p="x"}/^$/{if(p==""){next}}{p=$0;print}'
outputs:
1
2
3
4
5
6
7
8
9
10
Keep in mind that this is for truly blank lines (no content). If you're trying to collapse lines that have an arbitrary number of spaces or tabs, that will be a little trickier.
In that case, you could pipe the file through something like:
sed 's/^\s*$//'
to ensure lines with just whitespace become truly empty.
In other words, something like:
sed 's/^\s*$//' infile | awk 'my previous awk command'

To suppress repeated empty output lines with GNU cat:
cat -s file1 > file2

Here's one way using sed:
sed ':a; N; $!ba; s/\n\n\+/\n\n/g' file
Otherwise, if you don't mind a trailing blank line, all you need is:
awk '1' RS= ORS="\n\n" file
The Perl solution is even shorter:
perl -00 -pe '' file

You could do like this also,
awk -v RS="\0" '{gsub(/\n\n+/,"\n\n");}1' file
Explanation:
RS="\0" Once we set the null character as Record Seperator value, awk will read the whole file as single record.
gsub(/\n\n+/,"\n\n"); this replaces one or more blank lines with a single blank line. Note that \n\n regex matches a blank line along with the previous line's new line character.

Here is an other awk
awk -v p=1 'p=="" {p=1;next} 1; {p=$0}' file

How to use 'sed or gawk' to delete a text block until the third line previous the last one

Good day,
I was wondering how to delete a text block like this:
1
2
3
4
5
6
7
8
and delete from the second line until the third line previous the last one, to obtain:
1
2
6
7
8
Thanks in advance!!!
BTW This text block is just an example, the real text blocks I working on are huge and each one differs among them in the line numbers.

Getting the number of lines with wc and using awk to print the requested range:
$ awk 'NR<M || NR>N-M' M=3 N="$(wc -l file)" file
1
2
6
7
8
This allows you to easily change the range by just changing the value of M.

This might work for you (GNU sed):
sed '3,${:a;$!{N;s/\n/&/3;Ta;D}}' file
or i f you prefer:
sed '1,2b;:a;$!{N;s/\n/&/3;Ta;D}' file
These always print the first two lines, then build a running window of three lines.
Unless the end of file is reached the first line is popped off the window and deleted. At the end of file the remaining 3 lines are printed.

since you mentioned huge and also line numbers could be differ. I would suggest this awk one-liner:
awk 'NR<3{print;next}{delete a[NR-3];a[NR]=$0}END{for(x=NR-2;x<=NR;x++)print a[x]}' file
it processes the input file only once, without (pre) calculating total line numbers
it stores minimal data in memory, in all processing time, only 3 lines data were stored.
If you want to change the filtering criteria, for example, removing from line x to $-y, you just simply change the offset in the oneliner.
add a test:
kent$ seq 8|awk 'NR<3{print;next}{delete a[NR-3];a[NR]=$0}END{for(x=NR-2;x<=NR;x++)print a[x]}'
1
2
6
7
8

Using sed:
sed -n '
## Append second line, print first two lines and delete them.
N;
p;
s/^.*$//;
## Read next three lines removing leading newline character inserted
## by the "N" command.
N;
s/^\n//;
N;
:a;
N;
## I will keep three lines in buffer until last line when I will print
## them and exit.
$ { p; q };
## Not last line yet, so remove one line of buffer based in FIFO algorithm.
s/^[^\n]*\n//;
## Goto label "a".
ba
' infile
It yields:
1
2
6
7
8

sed: joining lines depending on the second one

I have a file that, occasionally, has split lines. The split is signaled by the fact that the line starts with '+' (possibly preceeded by spaces).
line 1
line 2
+ continue 2
line 3
...
I'd like join the split line back:
line 1
line 2 continue 2
line 3
...
using sed. I'm not clear how to join a line with the preceeding one.
Any suggestion?

This might work for you:
sed 'N;s/\n\s*+//;P;D' file
These are actually four commands:
N
Append line from the input file to the pattern space
s/\n\s*+//
Remove newline, following whitespace and the plus
P
print line from the pattern space until the first newline
D
delete line from the pattern space until the first newline, e.g. the part which was just printed
The relevant manual page parts are
Selecting lines by numbers
Addresses overview
Multiline techniques - using D,G,H,N,P to process multiple lines

Doing this in sed is certainly a good exercise, but it's pretty trivial in perl:
perl -0777 -pe 's/\n\s*\+//g' input

I'm not partial to sed so this was a nice challenge for me.
sed -n '1{h;n};/^ *+ */{s// /;H;n};{x;s/\n//g;p};${x;p}'
In awk this is approximately:
awk '
NR == 1 {hold = $0; next}
/^ *\+/ {$1 = ""; hold=hold $0; next}
{print hold; hold = $0}
END {if (hold) print hold}
'
If the last line is a "+" line, the sed version will print a trailing blank line. Couldn't figure out how to suppress it.

You can use Vim in Ex mode:
ex -sc g/+/-j -cx file
g global search
- select previous line
j join with next line
x save and close

Different use of hold space with POSIX sed... to load the entire file into the hold space before merging lines.
sed -n '1x;1!H;${g;s/\n\s*+//g;p}'
1x on the first line, swap the line into the empty hold space
1!H on non-first lines, append to the hold space
$ on the last line:
g get the hold space (the entire file)
s/\n\s*+//g replace newlines preceeding +
p print everything
Input:
line 1
line 2
+ continue 2
+ continue 2 even more
line 3
+ continued
becomes
line 1
line 2 continue 2 continue 2 even more
line 3 continued
This (or potong's answer) might be more interesting than a sed -z implementation if other commands were desired for other manipulations of the data you can simply stick them in before 1!H, while sed -z is immediately loading the entire file into the pattern space. That means you aren't manipulating single lines at any point. Same for perl -0777.
In other words, if you want to also eliminate comment lines starting with *, add in /^\s*\*/d to delete the line
sed -n '1x;/^\s*\*/d;1!H;${g;s/\n\s*+//g;p}'
versus:
sed -z 's/\n\s*+//g;s/\n\s*\*[^\n]*\n/\n/g'
The former's accumulation in the hold space line by line keeps you in classic sed line processing land, while the latter's sed -z dumps you into what could be some painful substring regexes.
But that's sort of an edge case, and you could always just pipe sed -z back into sed. So +1 for that.
Footnote for internet searches: This is SPICE netlist syntax.

A solution for versions of sed that can read NUL separated data, like here GNU Sed's -z:
sed -z 's/\n\s*+//g'
Compared to potong's solution this has the advantage of being able to join multiple lines that start with +. For example:
line 1
line 2
+ continue 2
+ continue 2 even more
line 3
becomes
line 1
line 2 continue 2 continue 2 even more
line 3

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

removing just the middle of the file with sed - sed

Related

Join certain lines with sed

Insert filename into text file with sed

How to remove empty lines to one empty line between sentences in text files?

How to use 'sed or gawk' to delete a text block until the third line previous the last one

sed: joining lines depending on the second one

Categories

Resources