Understanding sed hold-space work-flow - sed

I would like to print out the last line of a file which contains one or more integers. "Hippo 9991" in example below. I tried to achieve this with gsed -n -r '/[0-9]+/h;x;$p' command, but this doesn't quite work:
$ cat testfile
dog
lion 34
elephant
tiger 7
hippo 9991
zebra
gepard
cat
$ cat testfile | gsed -n -r '/[0-9]+/h;x;$p'
gepard
$
Could somebody explain what exactly gsed -n -r '/[0-9]+/h;x;$p' does? As I understand, it should remove the trailing new-line character from line and read the line into pattern space. Then if the line in pattern space contains one or more integers, the line is put into hold space by replacing the previous data in hold space. This cycle is repeated until the last line which will be printed. Obviously I do not understand this correctly. More than a correct answer I would like to understand the work-flow of sed.

You almost have it. Here is what your script does:
/[0-9]+/h # if line contains a number, save the line to hold space
x # swap content of pattern space and hold space
$p # when on the last line print pattern space
You save the line to hold space then swap it back to pattern space. The contents of pattern space and hold space can be illustrated like this:
Line Command Pattern Space Hold Space
~~~~ ~~~~~~~~~~~ ~~~~~~~~~~~~~ ~~~~~~~~~~
1 /[0-9]+/h dog
1 x dog
2 /[0-9]+/h lion 34 lion 34
2 x lion 34 lion 34
3 /[0-9]+/h elephant lion 34
3 x lion 34 elephant
4 /[0-9]+/h tiger 7 tiger 7
4 x tiger 7 tiger 7
.
.
.
$ /[0-9]+/h cat geopard
$ x geopard cat
$ p geopard cat
What you really want is to only swap contents when the last line of the input file is reached. You can do this by grouping the x and p commands:
gsed -n -r '/[0-9]+/h; $ {x;p}' testfile
Output:
hippo 9991
The corresponding pattern space and hold space sequence is now:
Line Command Pattern Space Hold Space
~~~~ ~~~~~~~~~~~ ~~~~~~~~~~~~~ ~~~~~~~~~~
1 /[0-9]+/h dog
2 /[0-9]+/h lion 34 lion 34
3 /[0-9]+/h elephant lion 34
4 /[0-9]+/h tiger 7 tiger 7
.
.
.
$ /[0-9]+/h cat hippo 9991
$ x hippo 9991 cat
$ p hippo 9991 cat

The following works for me:
sed -n -r '/[0-9]+/ {h;x}; ${x;p}'
You want to run both h and x only if the integer is present, in your example, x is run every time. At the end, you don't want to print the last line, but the last stored line, so you have to exchange them once more.

I can not help you with the sed version, but an awk solution could easely do it.
awk '/[0-9]+/ {f=$0} END {print f}' file
hippo 9991

This might work for you (GNU sed):
sed '/[0-9]/h;$!d;x' file
If a line contains a digit, hive it away in the hold space (previously hived away lines will be overwritten!). All but the last line delete (deleted lines never get to be printed). On the last line swap to the hold space. The natural flow of the program prints the last line containing a digit with no need for options.

I would do:
sed -n -r '/[0-9]+/{h}; ${x;p}' file
h overwrites hold space with current (matched) line
when till the last line($), we (x) exchange the pattern/hold space, and print the content of hold place, which would be the last matching line of the pattern [0-9]+.

grep [0-9] testfile | tail -1
Has the disadvantage that we don't get to learn about "sed" but so much simpler.

Related

How to find only the first and last line of a file using sed

I have a file called error_log for the apache and I want to see the first line and the last line of this file using sed command. Would you please help me how can I do that?
I know how to do that with head and tail commands, but I'm curious if it's possible in sed command too.
I have read the man sed and have googled a lot but nothing is found unfortunately.
This might work for you (GNU sed):
sed '1b;$b;d' file
All sed commands can be prefixed by either an address or a regexp. An address is either a line number or the $ which represents the last line. If neither an address or a regexp is present, the following command applies to all other lines.
The normal sed cycle, presents each line of input (less its newline) in the pattern space. The sed commands are then applied and the final act of the cycle is to re-attach the newline and print the result.
The b command controls command flow; if by itself it jumps out of the following sed commands to the final act of the cycle i.e. where the newline is re-attached and the result printed.
The d command deletes the pattern space and since there is nothing to be printed no further processing is executed (including re-attaching the newline and printing the result).
Thus the solution above prints the first line and the last and deletes the rest.
Sed has some command line options, one of which turns of the implicit printing of the result of the pattern space -n. The p command prints the current state of the pattern space. Thus the dual of the above solution is:
sed -n '1p;$p' file
N.B. If the input file is only one line the first solution will only print one line whereas the second solution will print the same line twice. Also if more than one file is input both solutions will print the first line of the first file and last line of the last file unless the -i option is in place, in which case each file will be amended. The -s option replicates this without amending each file but streams the results to stdout as if each file is treated separately.
This will work:
sed -n '1p ; $p' error_log
1p will print the first line and $p will print the last line.
As a suggestion, take a look at info sed, not only man sed. You can find the some examples about your question at the paragraph 2.1.
First line:
sed '2,$d' error_log
Last line:
sed '$!d' error_log
Based on your new requirement to output nothing if the input file is just 1 line (see How to find only the first and last line of a file using sed):
awk 'NR==1{first=$0} {last=$0} END{if (NR>1) print first ORS last}'
Original answer:
This is one of those things that you can, at face value, do easily enough in sed:
$ seq 3 7
3
4
5
6
7
$ seq 3 7 | sed -n '1p; $p'
3
7
but then how to handle edge cases like one line of input is non-obvious, e.g. is this REALLY the correct output:
$ printf 'foo\n' | sed -n '1p; $p'
foo
foo
or is the correct output just:
foo
and if the latter, how do you tweak that sed command to produce that output? #potong suggested a GNU sed command:
$ printf 'foo\n' | sed '1b;$b;d'
foo
which works but may be GNU-only (idk) and more importantly doesn't look much like the command we started with so the tiniest change in requirements meant a complete rewrite using different constructs.
Now, how about if you want to enhance it to, say, only print the first and last line if the file contained foo? I expect that'd be another challenging exercise with sed and probably involve non-portable constructs too.
It's just all pointless to learn how to do this with sed when you can use a different tool like awk and do whatever you like in a simple, consistent, portable syntax:
$ seq 3 7 |
awk 'NR==1{first=$0} {last=$0} END{print first ORS last}'
3
7
$ printf 'foo\n' |
awk 'NR==1{first=$0} {last=$0} END{print first ORS last}'
foo
foo
$ printf 'foo\n' |
awk 'NR==1{first=$0} {last=$0} END{print first (NR>1 ? ORS last : "")}'
foo
$ printf '3\nfoo\n7\n' |
awk 'NR==1{first=$0} /foo/{f=1} {last=$0} END{if (f) print first (NR>1 ? ORS last : "")}'
3
7
$ printf '3\nbar\n7\n' |
awk 'NR==1{first=$0} /foo/{f=1} {last=$0} END{if (f) print first (NR>1 ? ORS last : "")}'
$
Notice that:
Every command looks like every other command.
A minor change in requirements leads to a minor change in the code, not a complete rewrite.
Once you learn how to do any given thing A, how to do similar things B, C, D, etc. just builds on top of the syntax you already used, you don't have to learn a completely different syntax.
Each of those commands will work using any awk in any shell on every UNIX box.
Now, how about if you want to do that for multiple files such as would be created by the following commands?
$ seq 3 7 > file1
$ seq 12 25 > file2
With awk you can just store the lines in an array for printing in the END:
$ awk 'FNR==1{first[++cnt]=$0} {last[cnt]=$0}
END{for (i=1;i<=cnt;i++) print first[i] ORS last[i]}' file1 file2
3
7
12
25
or with GNU awk you can print them from ENDFILE:
$ awk 'FNR==1{first=$0} {last=$0} ENDFILE{print first ORS last}' file1 file2
3
7
12
25
With sed? An exercise left for the reader.

Sed Process Substitution on Insert - Without Backslashes

I have function that prints a header that needs to be applied across several files, but if I utilize a sed process substitution the lines prior to the last have a backslash \ on them.
E.g.
function print_header() {
cat << EOF
-------------------------------------------------------------------
$(date '+%B %d, %Y # ~ %r') ID:$(echo $RANDOM)
EOF
}
If I then take a file such as test.txt:
line 1
line 2
line 3
line 4
line 5
sed "1 i $(print_header | sed 's/$/\\/g')" test.txt
I get:
-------------------------------------------------------------------\
November 24, 2015 # ~ 11:18:28 AM ID:13187
line 1
line 2
line 3
line 4
line 5
Notice the troublesome backslash at the end of the first line, I'd like to not have that backslash appear. Any ideas?
I would use cat for that:
cat <(print_header) file > file_with_header
This behavior depends on the sed dialect. Unfortunately, it's one of the things which depends on which version you have.
To simplify debugging, try specifying verbatim text. Here's one from a Debian system.
vnix$ sed '1i\
> foo\
> bar' <<':'
> hello
> goodbye
> :
foo
bar
hello
goodbye
Your diagnostics appear to indicate that your sed dialect does not in fact require the backslash after the first i.
Since you are generating the contents of the header programmatically anyway, my recommended solution would be to refactor the code so that you can avoid this conundrum. If you don't want cat <<EOF test.txt then maybe experiment with sed 1r/dev/stdin' <<EOF test.txt (I could not get 1r- to work, but /dev/stdin should be portable to any Linux.)
Here is my kludgy fix, if you can find something more elegant I'll gladly credit you:
sed "1 i $(print_header | sed 's/$/\\/g;$s/$/\x01/')" test.txt | tr -d '\001'
This puts an unprintable SOH (\x01) ascii Start Of Header character after the inserted text, that precludes the backslashes and then I run it over tr to delete the SOH chars.

Delete lines by pattern in specific range of lines

I want to remove lines from file by regex pattern using sed just like in this question Delete lines in a text file that containing a specific string, but only inside a range of lines (not in the whole file). I want to do it starting from some line number till the end of file.
This is how I've done it in combination with tail:
tail -n +731 file|sed '/some_pattern/d' >> file
manually remove edited range in file from previous step
Is there a shorter way to do it with sed only?
Something like sed -i '731,1000/some_pattern/d' file?
You can use this sed,
sed -i.bak '731,1000{/some_pattern/d}' yourfile
Test:
$ cat a
1
2
3
13
23
4
5
$ sed '2,4{/3/d}' a
1
2
23
4
5
You need $ address to match end of file. With GNU sed:
sed -i '731,${/some_pattern/d;}' file
Note that this can be slower than tail -n +number, because sed will start processing at start of file instead of doing lseek() like tail.
(With BSD sed you need sed -i '' ...)
sed is for simple substitutions on individual lines, that is all. For anything even marginally more interesting an awk solution will be clearer, more robust, portable, maintainable, extensible and better in just about ever other desirable attribute of software.
Given this sample input file:
$ cat file
1
2
3
4
1
2
3
4
1
2
3
4
The following script will print every line except a line containing the number 3 that occurs after the 6th line of the input file:
$ awk '!(NR>6 && /3/)' file
1
2
3
4
1
2
4
1
2
4
Want to only do the deletion between lines 6 and 10? No problem:
$ awk '!(NR>6 && NR<10 && /3/)' file
1
2
3
4
1
2
4
1
2
3
4
Want the skipped lines written to a log file? No problem:
awk 'NR>6 && /3/{print > "log";next} {print}' file
Written to stderr?
awk 'NR>6 && /3/{print | "cat>&2";next} {print}' file
Want a count of how many lines you deleted also written to stderr?
awk 'NR>6 && /3/{print | "cat>&2"; cnt++; next} {print} END{print cnt | "cat>&2"}' file
ANYTHING you want to do additionally or differently will be easy and build on what you start with. Try doing any of the above, or just about anything else, with a sed script that satisfies your original requirement.
awk to the rescue!
awk '!(NR>=731 && /pattern/)' input > output

Extract every nth number from a txt file

So I have a txt file where I need to extract every third number and print it to separate file using Terminal. The txt file is just a long list of numbers, tab delimited:
18 25 0 18 24 5 18 23 5 18 22 8.2 ...
I know there is a way to do this using sed or awk, but so far I've only been able to extract every third line by using:
awk 'NR%3==1' testRain.txt > rainOnly.txt
So here's the answer (or rather, the answer I utilized!):
xargs -n1 < input.txt | awk '!(NR%3)' > output.txt
This gives you an output.txt that has every third number of the original file as a separate line.
A quick pipe line to extract every 3rd number:
$ xargs -n1 < file | sed '3~3!d'
0
5
5
8.2
If you don't want each number on a newline throw the result back through xargs:
$ xargs -n1 < file | sed '3~3!d' | xargs
0 5 5 8.2
Use redirection to store the output in a new file:
$ xargs -n1 < file | sed '3~3!d' | xargs > new_file
With awk using a simple for loop you could do:
$ awk '{for(i=3;i<=NF;i+=3)print $i}' file
0
5
5
8.2
or (adds a trailing tab):
$ awk '{for(i=3;i<=NF;i+=3)printf "%s\t",$i;print ""}' file
0 5 5 8.2
Or by setting the value of RS (adds trailing newline):
$ awk '!(NR%3)' RS='\t' file
0
5
5
8.2
$ awk '!(NR%3)' RS='\t' ORS='\t' file
0 5 5 8.2
You can print every third character by substituting the next two with nothing, globally. When the count straddles a newline, using Perl might be the simplest solution:
perl -p000 -e 's/(.)../$1/gs'
If you want the first, fourth etc character from every line, a line-oriented tool like sed suffices:
sed 's/\(.\)../\1/g'
Using grep -P
grep -oP '([^\t]+\t){2}\K[^\t\n]+' file
0
5
5
8.2
This might work for you (GNU sed):
sed -r 's/(\S+\s){3}/\1/g;s/\s$//' file
#user2718946
Your solution was close, but here you are without xarg.
awk 'NR%3==1' RS=" " file
18
18
18
18
Different start:
awk 'NR%3==0' RS=" " file
0
5
5
8.2

SED: how to find only even numbers in a given file using sed

I am new to bash and having a tough time figuring this out.
Using sed, could anyone help me in finding only even numbers in a given file?
I figured out how to find all numbers starting from [0,2,4,6,8] using this:
sed -n 's/^[0-9]*[02468] /&/w even' <file
But this doesn't guarantee that the number is even for sure.
I am having trouble in finding if the matched number ends with either [0,2,4,6,8] for it to be even for sure.
So can any one help me out with this?
Your regex looks a bit weird and I am not sure what you want to do, but this should help:
sed -r -n 's/^[0-9]*?[02468] /even/g'
-r to enable extended regex, *? to make it non-greedy, and /g to perform replacement globally for all lines in file.
Your command should work fine assuming that there is a space after all even numbers and that they are all at the beginning of the lines:
$ echo 'foo
1231
2220
1254 ' | sed -n '/[0-9]*[02468] /p'
2220
1254
Also note that, as you don't actually do a substitution, you don't need the s command. Use an address (pattern) specifier and w command (like I did above with the p command).
To make sure that the even digit is the last, but is not necessarily followed by a space, you can do something like
$ echo 'foo
1231
2220
1254 ' | sed -n '/[0-9]*[02468]\($\|[^0-9]\)/p'
2220
1254
Actually, your case looks more like a use case for grep, not sed, because you do filtering rather than editing. Everything becomes easier with GNU grep, as you can do
$ echo 'foo
1231
2220
1254 ' | grep -P '\d*[02468](?!\d)'
2220
1254
Just append > even to the command to make it write to the file even.
$ cat file
1
2
3
498
57
12345678
$ awk '$0%2' file
1
3
57
$ awk '!($0%2)' file
2
498
12345678
Why don't you find the numbers ending with [02468] ?