The usage of sed c\ command under AIX - sed

According to AIX man page
http://www-01.ibm.com/support/knowledgecenter/ssw_aix_71/com.ibm.aix.cmds5/sed.htm
They gave the definition of c\ command here
(2)c\
Text
Deletes the pattern space. With 0 or 1 address or at the end of a 2- address range, places the Text variable in output and then starts the next cycle.
I am confused with the idea of 0 or 1 address according to explanation above. Anyone can give an example of the usage of c\ command?
Thanks

c command will change the line to current position (so often the current line but maybe this is modified using buffer capacity like loading several line with n)
the 0,1 or 2 address range mean that you can use an addressing range (line number, pattern matching) before the instruction like:
# no address, so current line
c \
Add this line
# 1 adress, so line number corrsponding or pattern line matchin
2 c\
at line 2
/This/ c\
at each line that contain "This"
# 2 address
1,3 c\
for line 1 to 3 only
/Trig/,$ c\
From first line that contain "Trig" until the end

Related

How to parse sed regex syntax?

sed -i "0,/test/s//#test/g" file.txt
I do not know how to parse this regex. It is commenting out test by putting #, but my questions are
what is "0," at the beginning?
what is it not like "s/test/#test/g" ? aka why is /s is in the middle?
Any help is appreciated.
Lets break it down into smaller pieces:
https://www.gnu.org/software/sed/manual/sed.html#sed-script-overview
sed commands follow this syntax:
[addr]X[options]
X is a single-letter sed command. [addr] is an optional line address. If [addr] is specified, the command X will be executed only on the matched lines.
And
https://www.gnu.org/software/sed/manual/sed.html#Range-Addresses
An address range can be specified by specifying two addresses separated by a comma (,). An address range matches lines starting from where the first address matches, and continues until the second address matches (inclusively)
In the case of 0,/test/s//#test/g the address part is 0,/test/ because s is the command. An address part of 0,/test/ means the s command is only executed on lines inside that range. If the sed command was s/test/#test/g there wouldn't be an address part and the s command would be attempted on every line in the file.
https://www.gnu.org/software/sed/manual/sed.html#index-addr1_002c_002bN
A line number of 0 can be used in an address specification like 0,/regexp/ so that sed will try to match regexp in the first input line too. In other words, 0,/regexp/ is similar to 1,/regexp/, except that if addr2 matches the very first line of input the 0,/regexp/ form will consider it to end the range, whereas the 1,/regexp/ form will match the beginning of its range and hence make the range span up to the second occurrence of the regular expression.
Note that this is the only place where the 0 address makes sense; there is no 0-th line and commands which are given the 0 address in any other way will give an error.
So in 0,/test/s//#test/g, the address part 0,/test/ runs the s command only on the first line that matches /test/ - even if it is the first line.
https://www.gnu.org/software/sed/manual/sed.html#index-empty-regular-expression
The empty regular expression ‘//’ repeats the last regular expression match (the same holds if the empty regular expression is passed to the s command).
So 0,/test/s//#test/g is the same as 0,/test/s/test/#test/g because the empty regular expression matches the one that was used in the address part - but it can be left out because writing the same regex twice just makes the whole command less readable.
In conclusion:
s/test/#test/g does the replacement on every line in the file that contains test
0,/test/s//#test/g does the replacement only on the first line in the file that contains test

Sed to replace last character on condition

I have a file which has following lines
172XI207 X123955 1
412XE401 XE05689 1
412XI402 XI9515 1
412XI403 XI06702 1
412XE404 XE75348 1
I want to replace last column to 2 if the first two characters in the second column matches to XE.
The result should be like below
172XI207 X123955 1
412XE401 XE05689 2
412XI402 XI9515 1
412XI403 XI06702 1
412XE404 XE75348 2
I wanted to use sed (not awk). Can someone please let me know how this can be acheived using sed?
many sed commands take an address or address range (see the man page for the gory details). Probably the most common command is s of course, but it is among those that take an address range, meaning it doesn't need to apply to every line. An address range xan be a regular expression. The s command is:
{address}s/pattern/replacement/
For you the address - matching RE - is / XE/ (assuming your columns are space separarated; change that to a tab if necessary), the pattern is 1$ and the replacement 2. Therefore:
/ XE/s/1$/2/
or as a command line
sed -e '/ XE/s/1$/2/' < oldfile > newfile
EDIT: oops, second column, not start of line.
This command should do the trick (providing you are looking at myfile.txt)
sed -e '/ XE/ s/1$/2//' myfile.txt
You can make sure your replacement is acted by adding the -i option which will modify the file in-place, make sure it's exactly what you are expecting before though.
Edit: based on question in comments, here is a command that matches on 3rd column and replaces on fifth.
sed -e 's/^\(\(\w\+\W\+\)\{2\}XE\(\w\+\W\+\)\{2\}\)1/\12/'
Or, as an alternative, you can first select the line and then substitute:
sed -e '/^\(\w\+\W\+\)\{2\}XE/ s/^\(\(\w\+\W\+\)\{4\}\)1/\12/'

How to use 'sed or gawk' to delete a text block until the third line previous the last one

Good day,
I was wondering how to delete a text block like this:
1
2
3
4
5
6
7
8
and delete from the second line until the third line previous the last one, to obtain:
1
2
6
7
8
Thanks in advance!!!
BTW This text block is just an example, the real text blocks I working on are huge and each one differs among them in the line numbers.
Getting the number of lines with wc and using awk to print the requested range:
$ awk 'NR<M || NR>N-M' M=3 N="$(wc -l file)" file
1
2
6
7
8
This allows you to easily change the range by just changing the value of M.
This might work for you (GNU sed):
sed '3,${:a;$!{N;s/\n/&/3;Ta;D}}' file
or i f you prefer:
sed '1,2b;:a;$!{N;s/\n/&/3;Ta;D}' file
These always print the first two lines, then build a running window of three lines.
Unless the end of file is reached the first line is popped off the window and deleted. At the end of file the remaining 3 lines are printed.
since you mentioned huge and also line numbers could be differ. I would suggest this awk one-liner:
awk 'NR<3{print;next}{delete a[NR-3];a[NR]=$0}END{for(x=NR-2;x<=NR;x++)print a[x]}' file
it processes the input file only once, without (pre) calculating total line numbers
it stores minimal data in memory, in all processing time, only 3 lines data were stored.
If you want to change the filtering criteria, for example, removing from line x to $-y, you just simply change the offset in the oneliner.
add a test:
kent$ seq 8|awk 'NR<3{print;next}{delete a[NR-3];a[NR]=$0}END{for(x=NR-2;x<=NR;x++)print a[x]}'
1
2
6
7
8
Using sed:
sed -n '
## Append second line, print first two lines and delete them.
N;
p;
s/^.*$//;
## Read next three lines removing leading newline character inserted
## by the "N" command.
N;
s/^\n//;
N;
:a;
N;
## I will keep three lines in buffer until last line when I will print
## them and exit.
$ { p; q };
## Not last line yet, so remove one line of buffer based in FIFO algorithm.
s/^[^\n]*\n//;
## Goto label "a".
ba
' infile
It yields:
1
2
6
7
8

SED: Operate on Last seven lines regardless of file length

I would like to operate on the last 7 lines of a file with sed regardless of the filelength.
According to a related question this type of range won't work: $-6,$ {..commands..}
What is the equivalent that will?
Pipe the output of tail -7 into sed.
tail -7 test.txt | sed -e "s/e/WWW/"
More info on Pipes here.
You could just switch from sed(1) to ed(1), the commands are about the same. In this case, the command is the same, except with no limitations on address range.
$ cat > fl7.ed
ed - $1 << \eof
1,7s/$/ (one of the first seven lines)/
$-6,$s/$/ (one of the last seven lines)/
w
q
eof
$ sh fl7.ed yourfile
perl -lne 'END{print join$\,#a,"-",#b}push#a,$_ if#a<6;push#b,$_;shift#b if#b>7'
In the END{} block you can do whatever is required; #a contains the first 6, #b the last 7 lines as requested.
This should work for you:
sed '1{N;N;N;N;N};N;$s/foo/bar/g;P;D' inputfile
Explanation:
1{N;N;N;N;N} - when the first line is read, load the pattern space with five more lines (total: 6 at this point)
N - append another line
$s/foo/bar/g - when the last line is read, perform some operation on the entire contents of pattern space (the last seven lines of the file). Operations can be more complex than shown here
P - print the test before the first newline in pattern space
D - delete the text just printed and loop to the beginning of the script (the "append another line" step - the first instruction is skipped since it only applies to the first line in the file)
This might work for you:
sed ':a;1,6{$!N;ba};${s/foo/bar/g;q};N;D' file
Explanation:
Create a loop label. :a
Gather lines 1 to 6 in the pattern space (PS). 1,6{$!N;ba}
If it's the last line, process the PS and quit, therefore printing out the last seven lines. ${s/foo/bar/g;q}
If it's not the last line, append the next line to the PS. N
Delete upto the first newline and begin a new cycle without reading a new line. D

At what stage is sed's pattern space printed?

I have heard that for the pattern space, the maximum number of addresses is two.
And that sed goes through each line of the text file, and for each of them, runs through all the commands in the script expression or script file.
When does sed print the pattern space? Is it at the end of the text file, after it has done the last line? Or is it as the ending part of processing each line of the text file, just after it has run through all commands, it dumps the pattern space?
Can anybody demonstrate
a)the max limit of the pattern space being two?
b)the fact of when the pattern space is printed. And, if you can, please provide a textual source that says so too.
And why is it that here in my attempt to see the size of the pattern space, it looks like it can fit a lot..
When this tutorial, says
http://www.thegeekstuff.com/2009/12/unix-sed-tutorial-7-examples-for-sed-hold-and-pattern-buffer-operations/
Sed G function
The G function appends the contents of the holding area to the contents of the pattern space. The former and new contents are separated by a newline. The maximum number of addresses is two.
An example of what I found about the size of the pattern space, trying unsuccessfully to see its limit of two..
abc.txt is a text file with just the character z
sed h;G;G;G;G;G;G;G;G abc.txt
prints many zs so I guess it can hold more than 2.
So i've misunderstood some thing(s).
An address is a way of selecting lines. Lines can be selected using zero, one or two addresses. This has nothing to do with the capacity of pattern space.
Consider the following input file:
aaa
bbb
ccc
ddd
eee
This sed command has zero addresses, so it processes every line:
s/./X/
Result:
Xaa
Xbb
Xcc
Xdd
Xee
This command has one address, it selects only the third line:
3s/./X/
Result:
aaa
bbb
Xcc
ddd
eee
An address of $ as in $s/./X/ would function the same way, but for the last line (regardless of the number of lines).
Here is a two-address command. In this case, it selects the lines based on their content. A single address command can do this, too.
/b/,/d/s/./X/
Result:
aaa
Xbb
Xcc
Xdd
eee
Pattern space is printed when given an explicit p or P command or when the script is complete for the current line of the input file (which includes ending the processing of the file with the q command) if the -n (suppress automatic printing) option is not in place.
Here's a demonstration of sed printing each line immediately upon receiving and processing it:
for i in {1..3}; do echo aaa$i; sleep 2; done | sed 's/./X/'
The capacity of pattern space (and hold space) has to do with the number of characters it can hold (and is implementation dependent) rather than the number of input lines. The newlines separating those lines are simply another character in that total. The G command simply appends a copy of hold space onto the end of what's in pattern space. Multiple applications of the G command appends that many copies.
In the tutorial that you linked to, the statement "The maximum number of addresses is two." is somewhat ambiguous. What that indicates is that you can use zero, one or two addresses to select lines to apply that command to. As in the above examples, you could apply G to all lines, one line or a range of lines. Each command can accept zero, zero or one, or zero, one, or two addresses. See man sed under the Synopsis section for sub headings that group the commands by the number of addresses they accept.
From info sed:
3.1 How `sed' Works
'sed' maintains two data buffers: the active pattern space, and the
auxiliary hold space. Both are initially empty.
'sed' operates by performing the following cycle on each lines of
input: first, 'sed' reads one line from the input stream, removes any
trailing newline, and places it in the pattern space. Then commands
are executed; each command can have an address associated to it:
addresses are a kind of condition code, and a command is only executed
if the condition is verified before the command is to be executed.
When the end of the script is reached, unless the '-n' option is in
use, the contents of pattern space are printed out to the output
stream, adding back the trailing newline if it was removed.(1) Then the
next cycle starts for the next input line.
Unless special commands (like 'D') are used, the pattern space is
deleted between two cycles. The hold space, on the other hand, keeps
its data between cycles (see commands 'h', 'H', 'x', 'g', 'G' to move
data between both buffers).