Sed - Printing a pattern in a line matched more than once - sed

Input-
X's Score 1725 and Y's Score 6248 in the match number 576
I want sed to ouput-
1725
6248
My code-
sed 's/Score[[:space:]]\([0-9]+\)/\1/g'
The above code outputs -
1725 and Y's 6248 in the match

You could try the following sed commands
#!/bin/sed f
s/Score\s*/\
/g
s/\n\([0-9]\+\)[^\n]*/\
\1/g
s/^[^\n]*\n//
The first command replaces all "Score"s with newlines, so now all numbers are at the beginning of a line. To insert a newline character, we must write a backslash followed by an actual line break. That's why the command spawns two lines.
The second command will remove everything after the numbers that are on the beginning of a line. It will match a newline character followed by a number (this is how we now that this number was prefixed by a "Score" string). The number will be captured into variable \1. Then it will skip all characters up to the newline character. When writing the replacement, we must restore the newline character and the number that was captured into \1.
Because the first line contains text before the first "Score", we must remove it. That's what the last command does, it matches all characters up to the first newline, starting from the beginning of the contents of the pattern space (ie. our working buffer).
In a single command:
sed -e 's/Score\s*/\
/g;s/\n\([0-9]\+\)[^\n]*/\
\1/g;s/^[^\n]*\n//'
Hope this helps =)

One way using GNU sed because \b that matches a word boundary is an extension.
echo "X's Score 1725 and Y's Score 6248 in the match number 576" | sed -e '
## Surround searched numbers (preceded by "Score") with newline characters.
s/\bScore \([0-9]\+\)\b/\n\1\n/g;
## Delete all numbers not preceded by a newline character.
s/\([^\n0-9]\)[0-9]\+/\1/g;
## Remove all other characters but numbers and newlines.
s/[^0-9\n]\+//g;
## Remove extra newlines.
s/\n\([0-9]\)/\1/g;
s/\n$//
' infile
It yields:
1725
6248

You could AND two egreps:
<infile egrep -o 'Score [0-9]+' | egrep -o '[0-9]+$'

Related

Sed command to break comma separated string upto certain length

Example string
TEST,TEST1,TEST3,TEST4,TEST5
Expected output :
TEST,TEST1,
TEST3,TEST4,
TEST5
I want to split data from comma before 15th position
Try this:
sed 's/.\{,15\},/&\n/g' <<< "string" # or
sed 's/.\{,15\},/&\n/g' file
.\{,15\}, matches a part of input consisting of 0 to 15 characters followed by a comma. since sed is greedy while matching patterns, it will match as much characters as it can.
&\n expands up to matched part followed by a line feed.
s/REGEXP/REPLACEMENT/g replaces every match against REGEXP with REPLACEMENT.

Remove blank lines in a file using sed

France 211 55 Europe
Japan 144 120 Asia
Germany 96 61 Europe
England 94 56 Europe
Taiwan 55 144 Asia
North Korea 44 2134 Asia
The above is my data file.
There are empty lines in it.
There are no spaces or tabs in those empty lines.
I want to remove all empty lines in the data.
I did a search Delete empty lines using SED has given the perfect answer.
Before that, I wrote two sed code myself:
sed -r 's/\n\n+/\n/g' cou.data
sed 's/\n\n\n*/\n/g' cou.data
And I tried awk gsub, not successful either.
awk '{ gsub(/\n\n*/, "\n"); print }' cou.data
But they don't work and nothing changes.
Where did I do wrong about my sed code?
Use the following sed to delete all blank lines.
sed '/./!d' cou.data
Explanation:
/./ matches any character, including a newline.
! negates the selector, i.e. it makes the command apply to lines which do not match the selector, which in this case is the empty line(s).
d deletes the selected line(s).
cou.data is the path to the input file.
Where did you go wrong?
The following excerpt from How sed Works states:
sed operates by performing the following cycle on each line of input: first, sed reads one line from the input stream, removes any trailing newline, and places it in the pattern space. Then commands are executed; each command can have an address associated to it: addresses are a kind of condition code, and a command is only executed if the condition is verified before the command is to be executed.
When the end of the script is reached, unless the -n option is in use, the contents of pattern space are printed out to the output stream, adding back the trailing newline if it was removed.8 Then the next cycle starts for the next input line.
I've intentionally emboldened the parts which are pertinent to why your sed examples are not working. Given your examples:
They seem to disregard that sed reads one line at a time.
The trailing newlines, (\n\n and \n\n\n in your first and second example respectively), which you're trying to match don't actually exist. They've been removed by the time your regexp pattern is executed and then reinstated when the end of the script is reached.
RobC's answer is great if your lines are terminated by newline (linefeed or \n) only, because SED separates lines that way. If your lines are terminated by \r\n (or CRLF) - which you may have your reasons for doing even on a unix system - you will not get a match, because from sed's perspective the line isn't empty - the \r (CR) counts as a character. Instead you can try:
sed '/^\r$/d' filename
Explanation:
^ matches the start of the line
\r matches the carriage return
$ matches the end of the line
d deletes the selected line(s).
filename is the path to the input file.

SED - replace string newline anything with string newline varable

I have the following content in a file
dhcp_option_domain:
- test.domain
And what I need to do is this:
whenever the value 'dhcp_option_domain:' followed by a newline and then ANY string, replace it with 'dhcp_option_domain:' followed by a newline and a variable.
ie if I set a variable of dhcp_domain="different.com" then then string above would convert to:
dhcp_option_domain:
- different.com
Note that both lines have and need to maintain leading 2 spaces.
I do not want to just do a search and replace on 'test.domain' as I have a few cases to use this and the values could be different each time the sed command is run.
I have tried a few methods such as:
dhcp_domain="something.com"
sed -i 's|dhcp_option_domain:\n.*|dhcp_option_domain:\n - $dhcp_domain|g' filename
however cannot get it to work.
Thanks.
As the manual explains:
sed operates by performing the following cycle on each line of input: first, sed reads one line from the input stream, removes any trailing newline, and places it in the pattern space. Then commands are executed
Your regex (dhcp_option_domain:\n.*) does not match because there is no \n in the pattern space in the first place.
A possible solution:
sed '/dhcp_option_domain:$/{n;c\
- '"$dhcp_domain"'
}'
The /dhcp_option_domain:$/ part is an address. The following command is only executed on lines matching that pattern.
The { } command groups multiple commands into a single block.
The n command prints out the current pattern space and replaces it by the next line of input.
The c\ command replaces the current pattern space by whatever follows in the script. Here it gets a bit tricky. We have:
a literal newline in the sed program (required after c\), then
- (placing those characters in the pattern space literally, then
' (part of shell syntax, terminating the single-quoted part started by sed '...), then
" (starting a double-quoted part), then
$dhcp_domain (which, because it's in a double-quoted part, interpolates the contents of the dhcp_domain shell variable), then
" (terminating the double-quoted part), then
' (starting another single-quoted part), then
a literal newline again (terminating the text after c\), then
} (closing the block started by {).
By default, sed works line by line (using newline character to distinguish newlines)
$ cat ip.txt
foo baz
dhcp_option_domain:
- test.domain
123
dhcp_option_domain:
$ dhcp_domain='something.com'
$ sed '/^ dhcp_option_domain:/{n; s/.*/ - '"$dhcp_domain"'/}' ip.txt
foo baz
dhcp_option_domain:
- something.com
123
dhcp_option_domain:
/^ dhcp_option_domain:/ condition to match
{} to group more than one command to be executed when this condition is satisfied
n get next line
s/.*/ - '"$dhcp_domain"'/ replace it as required - note that shell variables won't be expanded inside single quotes, see sed substitution with bash variables
for details
note that last line in the file didn't trigger the change as there was no further line
tested on GNU sed, syntax might vary for other implementations
From GNU sed manual
n
If auto-print is not disabled, print the pattern space, then,
regardless, replace the pattern space with the next line of input. If
there is no more input then sed exits without processing any more
commands.
This might work for you (GNU sed):
sed '/dhcp_option_domain:$/{p;s// - '"${var}"'/;n;d}' file
Match on dhcp_option_domain:, print it, substitute the new domain name (maintaining indent), print the current line and fetch the next (n) and delete it.

SED extract value

can anybody please help me sed get the value of time, lat and lon based on the below text
{"class":"TPV","tag":"MID2","device":"/dev/ttyUSB0","mode":3,"time":"2012-10-02T10:43:21.000Z","ept":0.005,"lat":55.190682291,"lon":25.265912847,"alt":19.149,"epx":58.300,"epy":74.796,"epv":144.575,"track":148.2723,"speed":1.623,"climb":-1.471,"eps":149.59}
$ grep -oP '"lat":\K[\d.]+' file
$ grep -oP '"lon":\K[\d.]+' file
$ grep -oP '"time":"\K[^"]+' file
With egrep and sed
<infile egrep -o '"(lat|lon|time)":"?[^,]*' | sed 's/[^:]*://'
Output:
"2012-10-02T10:43:21.000Z"
55.190682291
25.265912847
Append tr -d '"' to the pipeline if you don't like double-quotes.
With sed alone
<infile sed -r 's/"(lat|lon|time)":"?([^,"]*)/\n\2\n/g' | sed -n '2~2p'
Output:
2012-10-02T10:43:21.000Z
55.190682291
25.265912847
The first sed separates matches so they will be on every other line, the second picks them out.
With tr and grep
<infile tr ',' '\n' | grep 'time\|lon\|lat'
Output:
"time":"2012-10-02T10:43:21.000Z"
"lat":55.190682291
"lon":25.265912847
This is fairly trivial with GNU awk:
awk -F, '{ for (i=1; i<=NF; i++) if ($i ~ /time|lat|lon/) { match($i, /^\"([^\"]+)\":\"?([^\"]+)\"?/, array); printf "%s: %s\n", array[1], array[2] } }' file.txt
Results:
time: 2012-10-02T10:43:21.000Z
lat: 55.190682291
lon: 25.265912847
I would do (as a sed script):
#!/bin/sed -f
h;G;G
s/[^\n]*"lat"\s*:\s*\([0-9.]*\)[^\n]*/\1/
s/[^\n]*"lon"\s*:\s*\([0-9.]*\)[^\n]*/\1/
s/\n[^\n]*"time"\s*:\s*"\([^"]*\)".*$/\
\1/
The first line three commands (h;G;G) copies the line twice. It does this by copying the input line into an auxiliary buffer (called the hold space) with the "h" command, and then appending the contents of this hold space into pattern space (ie. the working buffer) with the "G" command, twice. Now we have three copies of the line.
For simplicity and to be more general, there are three separate commands to extract the data, but the format is analogous:
Skip some characters until we find our key. Beware that we must skip characters that aren't newlines ([^\n]*) in the first two commands, otherwise they will affect the lines below them as a consequence of its the greedy behaviour (ie. if skip as many characters as you can before finding a "lat", you will skip the first two lines because the third line also contains "lat"). In the last command, you may skip any character (.*), but you must first skip a newline character to prevent it from matching the previous lines.
Skip the key
Skip zero or more white space characters (\s*)
Skip the colon
Skip more optional white space characters
Capture the data. A capture is specified by the backslashed parenthesis (ie. the \( and the \)), and it will store the input that matches the expression between the parenthesis into an auxiliary "variable" called \1 (if you have more than one capture group, then the second will be called \2, the third \3, and so on up to \9). In the first two commands we match a series of digits or periods ([0-9.]*). In the last command, we capture any characters that aren't a double quote ([^"]*"), but we also skip a double quote before an one after the capture group (ie. skip the openning and closing double quotes).
Skip more characters. We skip as many characters that aren't a newline as we can, so we effectively skip to the end of the line.
Finally, in each command we replace the match with the capture result. On the last command, since we match and therefore skip the newline separating the second and the third line, we must include it in the replacement. To include it, we have to add a backslash and an actual newline character after it. That's why the replacement is split into two lines.
Hope this helps =)

Can I use the sed command to replace multiple empty line with one empty line?

I know there is a similar question in SO How can I replace mutliple empty lines with a single empty line in bash?. But my question is can this be implemented by just using the sed command?
Thanks
Give this a try:
sed '/^$/N;/^\n$/D' inputfile
Explanation:
/^$/N - match an empty line and append it to pattern space.
; - command delimiter, allows multiple commands on one line, can be used instead of separating commands into multiple -e clauses for versions of sed that support it.
/^\n$/D - if the pattern space contains only a newline in addition to the one at the end of the pattern space, in other words a sequence of more than one newline, then delete the first newline (more generally, the beginning of pattern space up to and including the first included newline)
You can do this by removing empty lines first and appending line space with G command:
sed '/^$/d;G' text.txt
Edit2: the above command will add empty lines between each paragraph, if this is not desired, you could do:
sed -n '1{/^$/p};{/./,/^$/p}'
Or, if you don't mind that all leading empty lines will be stripped, it may be written as:
sed -n '/./,/^$/p'
since the first expression just evaluates the first line, and prints it if it is blank.
Here: -n option suppresses pattern space auto-printing, /./,/^$/ defines the range between at least one character and none character (i.e. empty space between newlines) and p tells to print this range.