SED extract value - sed

can anybody please help me sed get the value of time, lat and lon based on the below text
{"class":"TPV","tag":"MID2","device":"/dev/ttyUSB0","mode":3,"time":"2012-10-02T10:43:21.000Z","ept":0.005,"lat":55.190682291,"lon":25.265912847,"alt":19.149,"epx":58.300,"epy":74.796,"epv":144.575,"track":148.2723,"speed":1.623,"climb":-1.471,"eps":149.59}

$ grep -oP '"lat":\K[\d.]+' file
$ grep -oP '"lon":\K[\d.]+' file
$ grep -oP '"time":"\K[^"]+' file

With egrep and sed
<infile egrep -o '"(lat|lon|time)":"?[^,]*' | sed 's/[^:]*://'
Output:
"2012-10-02T10:43:21.000Z"
55.190682291
25.265912847
Append tr -d '"' to the pipeline if you don't like double-quotes.
With sed alone
<infile sed -r 's/"(lat|lon|time)":"?([^,"]*)/\n\2\n/g' | sed -n '2~2p'
Output:
2012-10-02T10:43:21.000Z
55.190682291
25.265912847
The first sed separates matches so they will be on every other line, the second picks them out.
With tr and grep
<infile tr ',' '\n' | grep 'time\|lon\|lat'
Output:
"time":"2012-10-02T10:43:21.000Z"
"lat":55.190682291
"lon":25.265912847

This is fairly trivial with GNU awk:
awk -F, '{ for (i=1; i<=NF; i++) if ($i ~ /time|lat|lon/) { match($i, /^\"([^\"]+)\":\"?([^\"]+)\"?/, array); printf "%s: %s\n", array[1], array[2] } }' file.txt
Results:
time: 2012-10-02T10:43:21.000Z
lat: 55.190682291
lon: 25.265912847

I would do (as a sed script):
#!/bin/sed -f
h;G;G
s/[^\n]*"lat"\s*:\s*\([0-9.]*\)[^\n]*/\1/
s/[^\n]*"lon"\s*:\s*\([0-9.]*\)[^\n]*/\1/
s/\n[^\n]*"time"\s*:\s*"\([^"]*\)".*$/\
\1/
The first line three commands (h;G;G) copies the line twice. It does this by copying the input line into an auxiliary buffer (called the hold space) with the "h" command, and then appending the contents of this hold space into pattern space (ie. the working buffer) with the "G" command, twice. Now we have three copies of the line.
For simplicity and to be more general, there are three separate commands to extract the data, but the format is analogous:
Skip some characters until we find our key. Beware that we must skip characters that aren't newlines ([^\n]*) in the first two commands, otherwise they will affect the lines below them as a consequence of its the greedy behaviour (ie. if skip as many characters as you can before finding a "lat", you will skip the first two lines because the third line also contains "lat"). In the last command, you may skip any character (.*), but you must first skip a newline character to prevent it from matching the previous lines.
Skip the key
Skip zero or more white space characters (\s*)
Skip the colon
Skip more optional white space characters
Capture the data. A capture is specified by the backslashed parenthesis (ie. the \( and the \)), and it will store the input that matches the expression between the parenthesis into an auxiliary "variable" called \1 (if you have more than one capture group, then the second will be called \2, the third \3, and so on up to \9). In the first two commands we match a series of digits or periods ([0-9.]*). In the last command, we capture any characters that aren't a double quote ([^"]*"), but we also skip a double quote before an one after the capture group (ie. skip the openning and closing double quotes).
Skip more characters. We skip as many characters that aren't a newline as we can, so we effectively skip to the end of the line.
Finally, in each command we replace the match with the capture result. On the last command, since we match and therefore skip the newline separating the second and the third line, we must include it in the replacement. To include it, we have to add a backslash and an actual newline character after it. That's why the replacement is split into two lines.
Hope this helps =)

Related

GREP Print Blank Lines For Non-Matches

I want to extract strings between two patterns with GREP, but when no match is found, I would like to print a blank line instead.
Input
This is very new
This is quite old
This is not so new
Desired Output
is very
is not so
I've attempted:
grep -o -P '(?<=This).*?(?=new)'
But this does not preserve the second blank line in the above example. Have searched for over an hour, tried a few things but nothing's worked out.
Will happily used a solution in SED if that's easier!
You can use
#!/bin/bash
s='This is very new
This is quite old
This is not so new'
sed -En 's/.*This(.*)new.*|.*/\1/p' <<< "$s"
See the online demo yielding
is very
is not so
Details:
E - enables POSIX ERE regex syntax
n - suppresses default line output
s/.*This(.*)new.*|.*/\1/ - finds any text, This, any text (captured into Group 1, \1, and then any text again, or the whole string (in sed, line), and replaces with Group 1 value.
p - prints the result of the substitution.
And this is what you need for your actual data:
sed -En 's/.*"user_ip":"([^"]*).*|.*/\1/p'
See this online demo. The [^"]* matches zero or more chars other than a " char.
With your shown samples, please try following awk code.
awk -F'This\\s+|\\s+new' 'NF==3{print $2;next} NF!=3{print ""}' Input_file
OR
awk -F'This\\s+|\\s+new' 'NF==3{print $2;next} {print ""}' Input_file
Explanation: Simple explanation would be, setting This\\s+ OR \\s+new as field separators for all the lines of Input_file. Then in main program checking condition if NF(number of fields) are 3 then print 2nd field (where next will take cursor to next line). In another condition checking if NF(number of fields) is NOT equal to 3 then simply print a blank line.
sed:
sed -E '
/This.*new/! s/.*//
s/.*This(.*)new.*/\1/
' file
first line: lines not matching "This.*new", remove all characters leaving a blank line
second lnie: lines matching the pattern, keep only the "middle" text
this is not the pcre non-greedy match: the line
This is new but that is not new
will produce the output
is new but that is not
To continue to use PCRE, use perl:
perl -lpe '$_ = /This(.*?)new/ ? $1 : ""' file
This might work for you:
sed -E 's/.*This(.*)new.*|.*/\1/' file
If the first match is made, the line is replace by everything between This and new.
Otherwise the second match will remove everything.
N.B. The substitution will always match one of the conditions. The solution was suggested by Wiktor Stribiżew.

sed pattern negation with a comma separated line

I have a text file full of lines looking like:
Female,"$0 to $25,000",Arlington Heights,0,60462,ZD111326,9/18/13 0:21,Disk Drive
I am trying to change all of the commas , to pipes |, except for the commas within the quotes.
Trying to use sed (which I am new to)... and it is not working. Using:
sed '/".*"/!s/\,/|/g' textfile.csv
Any thoughts?
As a test case, consider this file:
Female,"$0 to $25,000",Arlington Heights,0,60462,ZD111326,9/18/13 0:21,Disk Drive
foo,foo,"x,y,z",foo,"a,b,c",foo,"yes,no"
"x,y,z",foo,"a,b,c",foo,"yes,no",foo
Here is a sed command to replace non-quoted commas with pipe symbols:
$ sed -r ':a; s/^([^"]*("[^"]*"[^"]*)*),/\1|/g; t a' file
Female|"$0 to $25,000"|Arlington Heights|0|60462|ZD111326|9/18/13 0:21|Disk Drive
foo|foo|"x,y,z"|foo|"a,b,c"|foo|"yes,no"
"x,y,z"|foo|"a,b,c"|foo|"yes,no"|foo
Explanation
This looks for commas that appear after pairs of double quotes and replaces them with pipe symbols.
:a
This defines a label a.
s/^([^"]*("[^"]*"[^"]*)*),/\1|/g
If 0, 2, 4, or any an even number of quotes precede a comma on the line, then replace that comma with a pipe symbol.
^
This matches at the start of the line.
(`
This starts the main grouping (\1).
[^"]*
This looks for zero or more non-quote characters.
("[^"]*"[^"]*)*
The * outside the parens means that we are looking for zero or more of the pattern inside the parens. The pattern inside the parens consists of a quote, any number of non-quotes, a quote and then any number on non-quotes.
In other words, this grouping only matches pairs of quotes. Because of the * outside the parens, it can match any even number of quotes.
)
This closes the main grouping
,
This requires that the grouping be followed by a comma.
t a
If the previous s command successfully made a substitution, then the test command tells sed to jump back to label a and try again.
If no substitution was made, then we are done.
using awk could be eaiser:
kent$ cat f
foo,foo,"x,y,z",foo,"a,b,c",foo,"yes,no"
Female,"$0 to $25,000",Arlington Heights,0,60462,ZD111326,9/18/13 0:21,Disk Drive
kent$ awk -F'"' -v OFS='"' '{for(i=1;i<=NF;i++)if(i%2)gsub(",","|",$i)}7' f
foo|foo|"x,y,z"|foo|"a,b,c"|foo|"yes,no"
Female|"$0 to $25,000"|Arlington Heights|0|60462|ZD111326|9/18/13 0:21|Disk Drive
I suggest a language with a proper CSV parser. For example:
ruby -rcsv -ne 'puts CSV.generate_line(CSV.parse_line($_), :col_sep=>"|")' file
Female|$0 to $25,000|Arlington Heights|0|60462|ZD111326|9/18/13 0:21|Disk Drive
Here I would have used gnu awks FPAT. It define how a field looks like FS that tells what the separator is. Then you can just set the output separator to |
awk '{$1=$1}1' OFS=\| FPAT="([^,]+)|(\"[^\"]+\")" file
Female|"$0 to $25,000"|Arlington Heights|0|60462|ZD111326|9/18/13 0:21|Disk Drive
If your awk does not support FPAT, this can be used:
awk -F, '{for (i=1;i<NF;i++) {c+=gsub(/\"/,"&",$i);printf "%s"(c%2?FS:"|"),$i}print $NF}' file
Female|"$0 to $25,000"|Arlington Heights|0|60462|ZD111326|9/18/13 0:21|Disk Drive
sed 's/"\(.*\),\(.*\)"/"\1##HOLD##\2"/g;s/,/|/g;s/##HOLD##/,/g'
This will match the text in quotes and put a placeholder for the commas, then switch all the other commas to pipes and put the placeholder back to commas. You can change the ##HOLD## text to whatever you want.

Sed - Printing a pattern in a line matched more than once

Input-
X's Score 1725 and Y's Score 6248 in the match number 576
I want sed to ouput-
1725
6248
My code-
sed 's/Score[[:space:]]\([0-9]+\)/\1/g'
The above code outputs -
1725 and Y's 6248 in the match
You could try the following sed commands
#!/bin/sed f
s/Score\s*/\
/g
s/\n\([0-9]\+\)[^\n]*/\
\1/g
s/^[^\n]*\n//
The first command replaces all "Score"s with newlines, so now all numbers are at the beginning of a line. To insert a newline character, we must write a backslash followed by an actual line break. That's why the command spawns two lines.
The second command will remove everything after the numbers that are on the beginning of a line. It will match a newline character followed by a number (this is how we now that this number was prefixed by a "Score" string). The number will be captured into variable \1. Then it will skip all characters up to the newline character. When writing the replacement, we must restore the newline character and the number that was captured into \1.
Because the first line contains text before the first "Score", we must remove it. That's what the last command does, it matches all characters up to the first newline, starting from the beginning of the contents of the pattern space (ie. our working buffer).
In a single command:
sed -e 's/Score\s*/\
/g;s/\n\([0-9]\+\)[^\n]*/\
\1/g;s/^[^\n]*\n//'
Hope this helps =)
One way using GNU sed because \b that matches a word boundary is an extension.
echo "X's Score 1725 and Y's Score 6248 in the match number 576" | sed -e '
## Surround searched numbers (preceded by "Score") with newline characters.
s/\bScore \([0-9]\+\)\b/\n\1\n/g;
## Delete all numbers not preceded by a newline character.
s/\([^\n0-9]\)[0-9]\+/\1/g;
## Remove all other characters but numbers and newlines.
s/[^0-9\n]\+//g;
## Remove extra newlines.
s/\n\([0-9]\)/\1/g;
s/\n$//
' infile
It yields:
1725
6248
You could AND two egreps:
<infile egrep -o 'Score [0-9]+' | egrep -o '[0-9]+$'

How to use sed-awk-gawk to display a matched string

I've got a file called 'res' that's 29374 characters of http data in a one-line string. Inside it, there are several http links, but I only want to be display those that end in '/idNNNNNNNNN' where N is a digit. In fact I'm only interested in the string 'idNNNNNNNNN'.
I've tried with:
cat res | sed -n '0,/.*\(id[0-9]*\).*/s//\1/p'
but I get the whole file.
Do you know a way to do it?
perl -n -E 'say $1 while m!/id(\d{9})!g' input-file
should work. That assumes exactly 9 digits; that's the {9} in the above. You can match 8 or 9 ({8,9}), 8 or more ({8,}), up to 9 ({0,9}), etc.
Example of this working:
$ echo -n 'junk jumk http://foo/id231313 junk lalala http://bar/id23123 asda' | perl -n -E 'say $1 while m!id(\d{0,9})!g'
231313
23123
That's with the 0 to 9 variant, of course.
If you're stuck with a pre-5.10 perl, use -e instead of -E and print "$1\n" instead of say $1.
How it works
First is the two command-line arguments to Perl. -n tells Perl to read input from standard input or files given on the command line, line by line, setting $_ to each line. $_ is perl's default target for a lot of things, including regular expression matches. -E merely tells Perl that the next argument is a Perl one-liner, using the new language features (vs. -e which does not use the 5.10 extensions).
So, looking at the one liner: say means to print out some value, followed by a newline. $1 is the first regular expression capture (captures are made by parentheses in regular expressions). while is a looping construct, which you're probably familiar with. m is the match operator, the ! after it is the regular expression delimiter (normally, you see / here, but since the pattern contains / it's easier to use something else, so you don't have to escape the / as \/). /id(\d{9}) is the regular expression to match. Keep in mind that the delimiter is !, so the / is not special, it just matches a literal /. The parentheses form a capture group, so $1 will be the number. The ! is the delimiter, followed by g which means to match as many times as possible (as opposed to once). This is what makes it pick up all the URLs in the line, not just the first. As long as there is a match, the m operator will return a true value, so the loop will continue (and run that say $1, printing out the match).
Two-sed solution
I think this is one way to do this with only sed. Much more complicated!
echo 'junk jumk http://foo/id231313 junk lalala http://bar/id23123 asda' | \
sed 's!http://!\nhttp://!g' | \
sed 's!^.*/id\([0-9]*\).*$!\1!'
cat res | perl -ne 'chomp; print "$1\n" if m/\/(id\d*)/'
The trouble is that sed and grep and awk work on lines, and you've only got one line. So, you probably need to split things up so you have more than one line -- then you can make the normal tools work.
tr ':' '\012' < res |
sed -n 's%.*/\(id[0-9][0-9]*\).*%\1%p'
This takes advantage of URLs containing colons and maps colons to newlines with tr, then uses sed to pick up anything up to a slash, followed by id and one or more digits, followed by anything, and prints out the id and digit string (only). Since these only occur in URLs, they will only appear one per line and relatively near the start of the line too.
Here's a solution using only one invocation of sed:
sed -n 's| |\n|g;/^http/{s|http://[^/]*/id\([0-9]*\)|\1|;P};D' inputfile
Explanation:
s| |\n|g; - Divide and conquer
/^http/{ - If pattern space begins with "http"
s|http://[^/]*/id\([0-9]*\)|\1|; - capture the id
P - Print the string preceding the first newline
}; - end if
D - Delete the string preceding the first newline regardless of whether it contains "http"
Edit:
This version uses the same technique but is more selective.
sed -n 's|http://|\n&|g;/^\n*http/{s|\n*http://[^/]*/id\([0-9]*\)|\1\n|;P};D' inputfile

How can I replace each newline (\n) with a space using sed?

How can I replace a newline ("\n") with a space ("") using the sed command?
I unsuccessfully tried:
sed 's#\n# #g' file
sed 's#^$# #g' file
How do I fix it?
sed is intended to be used on line-based input. Although it can do what you need.
A better option here is to use the tr command as follows:
tr '\n' ' ' < input_filename
or remove the newline characters entirely:
tr -d '\n' < input.txt > output.txt
or if you have the GNU version (with its long options)
tr --delete '\n' < input.txt > output.txt
Use this solution with GNU sed:
sed ':a;N;$!ba;s/\n/ /g' file
This will read the whole file in a loop (':a;N;$!ba), then replaces the newline(s) with a space (s/\n/ /g). Additional substitutions can be simply appended if needed.
Explanation:
sed starts by reading the first line excluding the newline into the pattern space.
Create a label via :a.
Append a newline and next line to the pattern space via N.
If we are before the last line, branch to the created label $!ba ($! means not to do it on the last line. This is necessary to avoid executing N again, which would terminate the script if there is no more input!).
Finally the substitution replaces every newline with a space on the pattern space (which is the whole file).
Here is cross-platform compatible syntax which works with BSD and OS X's sed (as per #Benjie comment):
sed -e ':a' -e 'N' -e '$!ba' -e 's/\n/ /g' file
As you can see, using sed for this otherwise simple problem is problematic. For a simpler and adequate solution see this answer.
Fast answer
sed ':a;N;$!ba;s/\n/ /g' file
:a create a label 'a'
N append the next line to the pattern space
$! if not the last line, ba branch (go to) label 'a'
s substitute, /\n/ regex for new line, / / by a space, /g global match (as many times as it can)
sed will loop through step 1 to 3 until it reach the last line, getting all lines fit in the pattern space where sed will substitute all \n characters
Alternatives
All alternatives, unlike sed will not need to reach the last line to begin the process
with bash, slow
while read line; do printf "%s" "$line "; done < file
with perl, sed-like speed
perl -p -e 's/\n/ /' file
with tr, faster than sed, can replace by one character only
tr '\n' ' ' < file
with paste, tr-like speed, can replace by one character only
paste -s -d ' ' file
with awk, tr-like speed
awk 1 ORS=' ' file
Other alternative like "echo $(< file)" is slow, works only on small files and needs to process the whole file to begin the process.
Long answer from the sed FAQ 5.10
5.10. Why can't I match or delete a newline using the \n escape
sequence? Why can't I match 2 or more lines using \n?
The \n will never match the newline at the end-of-line because the
newline is always stripped off before the line is placed into the
pattern space. To get 2 or more lines into the pattern space, use
the 'N' command or something similar (such as 'H;...;g;').
Sed works like this: sed reads one line at a time, chops off the
terminating newline, puts what is left into the pattern space where
the sed script can address or change it, and when the pattern space
is printed, appends a newline to stdout (or to a file). If the
pattern space is entirely or partially deleted with 'd' or 'D', the
newline is not added in such cases. Thus, scripts like
sed 's/\n//' file # to delete newlines from each line
sed 's/\n/foo\n/' file # to add a word to the end of each line
will NEVER work, because the trailing newline is removed before
the line is put into the pattern space. To perform the above tasks,
use one of these scripts instead:
tr -d '\n' < file # use tr to delete newlines
sed ':a;N;$!ba;s/\n//g' file # GNU sed to delete newlines
sed 's/$/ foo/' file # add "foo" to end of each line
Since versions of sed other than GNU sed have limits to the size of
the pattern buffer, the Unix 'tr' utility is to be preferred here.
If the last line of the file contains a newline, GNU sed will add
that newline to the output but delete all others, whereas tr will
delete all newlines.
To match a block of two or more lines, there are 3 basic choices:
(1) use the 'N' command to add the Next line to the pattern space;
(2) use the 'H' command at least twice to append the current line
to the Hold space, and then retrieve the lines from the hold space
with x, g, or G; or (3) use address ranges (see section 3.3, above)
to match lines between two specified addresses.
Choices (1) and (2) will put an \n into the pattern space, where it
can be addressed as desired ('s/ABC\nXYZ/alphabet/g'). One example
of using 'N' to delete a block of lines appears in section 4.13
("How do I delete a block of specific consecutive lines?"). This
example can be modified by changing the delete command to something
else, like 'p' (print), 'i' (insert), 'c' (change), 'a' (append),
or 's' (substitute).
Choice (3) will not put an \n into the pattern space, but it does
match a block of consecutive lines, so it may be that you don't
even need the \n to find what you're looking for. Since GNU sed
version 3.02.80 now supports this syntax:
sed '/start/,+4d' # to delete "start" plus the next 4 lines,
in addition to the traditional '/from here/,/to there/{...}' range
addresses, it may be possible to avoid the use of \n entirely.
A shorter awk alternative:
awk 1 ORS=' '
Explanation
An awk program is built up of rules which consist of conditional code-blocks, i.e.:
condition { code-block }
If the code-block is omitted, the default is used: { print $0 }. Thus, the 1 is interpreted as a true condition and print $0 is executed for each line.
When awk reads the input it splits it into records based on the value of RS (Record Separator), which by default is a newline, thus awk will by default parse the input line-wise. The splitting also involves stripping off RS from the input record.
Now, when printing a record, ORS (Output Record Separator) is appended to it, default is again a newline. So by changing ORS to a space all newlines are changed to spaces.
GNU sed has an option, -z, for null-separated records (lines). You can just call:
sed -z 's/\n/ /g'
The Perl version works the way you expected.
perl -i -p -e 's/\n//' file
As pointed out in the comments, it's worth noting that this edits in place. -i.bak will give you a backup of the original file before the replacement in case your regular expression isn't as smart as you thought.
Who needs sed? Here is the bash way:
cat test.txt | while read line; do echo -n "$line "; done
In order to replace all newlines with spaces using awk, without reading the whole file into memory:
awk '{printf "%s ", $0}' inputfile
If you want a final newline:
awk '{printf "%s ", $0} END {printf "\n"}' inputfile
You can use a character other than space:
awk '{printf "%s|", $0} END {printf "\n"}' inputfile
tr '\n' ' '
is the command.
Simple and easy to use.
Three things.
tr (or cat, etc.) is absolutely not needed. (GNU) sed and (GNU) awk, when combined, can do 99.9% of any text processing you need.
stream != line based. ed is a line-based editor. sed is not. See sed lecture for more information on the difference. Most people confuse sed to be line-based because it is, by default, not very greedy in its pattern matching for SIMPLE matches - for instance, when doing pattern searching and replacing by one or two characters, it by default only replaces on the first match it finds (unless specified otherwise by the global command). There would not even be a global command if it were line-based rather than STREAM-based, because it would evaluate only lines at a time. Try running ed; you'll notice the difference. ed is pretty useful if you want to iterate over specific lines (such as in a for-loop), but most of the times you'll just want sed.
That being said,
sed -e '{:q;N;s/\n/ /g;t q}' file
works just fine in GNU sed version 4.2.1. The above command will replace all newlines with spaces. It's ugly and a bit cumbersome to type in, but it works just fine. The {}'s can be left out, as they're only included for sanity reasons.
Why didn't I find a simple solution with awk?
awk '{printf $0}' file
printf will print the every line without newlines, if you want to separate the original lines with a space or other:
awk '{printf $0 " "}' file
The answer with the :a label ...
How can I replace a newline (\n) using sed?
... does not work in freebsd 7.2 on the command line:
( echo foo ; echo bar ) | sed ':a;N;$!ba;s/\n/ /g'
sed: 1: ":a;N;$!ba;s/\n/ /g": unused label 'a;N;$!ba;s/\n/ /g'
foo
bar
But does if you put the sed script in a file or use -e to "build" the sed script...
> (echo foo; echo bar) | sed -e :a -e N -e '$!ba' -e 's/\n/ /g'
foo bar
or ...
> cat > x.sed << eof
:a
N
$!ba
s/\n/ /g
eof
> (echo foo; echo bar) | sed -f x.sed
foo bar
Maybe the sed in OS X is similar.
Easy-to-understand Solution
I had this problem. The kicker was that I needed the solution to work on BSD's (Mac OS X) and GNU's (Linux and Cygwin) sed and tr:
$ echo 'foo
bar
baz
foo2
bar2
baz2' \
| tr '\n' '\000' \
| sed 's:\x00\x00.*:\n:g' \
| tr '\000' '\n'
Output:
foo
bar
baz
(has trailing newline)
It works on Linux, OS X, and BSD - even without UTF-8 support or with a crappy terminal.
Use tr to swap the newline with another character.
NULL (\000 or \x00) is nice because it doesn't need UTF-8 support and it's not likely to be used.
Use sed to match the NULL
Use tr to swap back extra newlines if you need them
You can use xargs:
seq 10 | xargs
or
seq 10 | xargs echo -n
cat file | xargs
for the sake of completeness
If you are unfortunate enough to have to deal with Windows line endings, you need to remove the \r and the \n:
tr '\r\n' ' ' < $input > $output
I'm not an expert, but I guess in sed you'd first need to append the next line into the pattern space, bij using "N". From the section "Multiline Pattern Space" in "Advanced sed Commands" of the book sed & awk (Dale Dougherty and Arnold Robbins; O'Reilly 1997; page 107 in the preview):
The multiline Next (N) command creates a multiline pattern space by reading a new line of input and appending it to the contents of the pattern space. The original contents of pattern space and the new input line are separated by a newline. The embedded newline character can be matched in patterns by the escape sequence "\n". In a multiline pattern space, the metacharacter "^" matches the very first character of the pattern space, and not the character(s) following any embedded newline(s). Similarly, "$" matches only the final newline in the pattern space, and not any embedded newline(s). After the Next command is executed, control is then passed to subsequent commands in the script.
From man sed:
[2addr]N
Append the next line of input to the pattern space, using an embedded newline character to separate the appended material from the original contents. Note that the current line number changes.
I've used this to search (multiple) badly formatted log files, in which the search string may be found on an "orphaned" next line.
In response to the "tr" solution above, on Windows (probably using the Gnuwin32 version of tr), the proposed solution:
tr '\n' ' ' < input
was not working for me, it'd either error or actually replace the \n w/ '' for some reason.
Using another feature of tr, the "delete" option -d did work though:
tr -d '\n' < input
or '\r\n' instead of '\n'
I used a hybrid approach to get around the newline thing by using tr to replace newlines with tabs, then replacing tabs with whatever I want. In this case, " " since I'm trying to generate HTML breaks.
echo -e "a\nb\nc\n" |tr '\n' '\t' | sed 's/\t/ <br> /g'`
You can also use this method:
sed 'x;G;1!h;s/\n/ /g;$!d'
Explanation
x - which is used to exchange the data from both space (pattern and hold).
G - which is used to append the data from hold space to pattern space.
h - which is used to copy the pattern space to hold space.
1!h - During first line won't copy pattern space to hold space due to \n is
available in pattern space.
$!d - Clear the pattern space every time before getting the next line until the
the last line.
Flow
When the first line get from the input, an exchange is made, so 1 goes to hold space and \n comes to pattern space, appending the hold space to pattern space, and a substitution is performed and deletes the pattern space.
During the second line, an exchange is made, 2 goes to hold space and 1 comes to the pattern space, G append the hold space into the pattern space, h copy the pattern to it, the substitution is made and deleted. This operation is continued until EOF is reached and prints the exact result.
Bullet-proof solution. Binary-data-safe and POSIX-compliant, but slow.
POSIX sed
requires input according to the
POSIX text file
and
POSIX line
definitions, so NULL-bytes and too long lines are not allowed and each line must end with a newline (including the last line). This makes it hard to use sed for processing arbitrary input data.
The following solution avoids sed and instead converts the input bytes to octal codes and then to bytes again, but intercepts octal code 012 (newline) and outputs the replacement string in place of it. As far as I can tell the solution is POSIX-compliant, so it should work on a wide variety of platforms.
od -A n -t o1 -v | tr ' \t' '\n\n' | grep . |
while read x; do [ "0$x" -eq 012 ] && printf '<br>\n' || printf "\\$x"; done
POSIX reference documentation:
sh,
shell command language,
od,
tr,
grep,
read,
[,
printf.
Both read, [, and printf are built-ins in at least bash, but that is probably not guaranteed by POSIX, so on some platforms it could be that each input byte will start one or more new processes, which will slow things down. Even in bash this solution only reaches about 50 kB/s, so it's not suited for large files.
Tested on Ubuntu (bash, dash, and busybox), FreeBSD, and OpenBSD.
In some situations maybe you can change RS to some other string or character. This way, \n is available for sub/gsub:
$ gawk 'BEGIN {RS="dn" } {gsub("\n"," ") ;print $0 }' file
The power of shell scripting is that if you do not know how to do it in one way you can do it in another way. And many times you have more things to take into account than make a complex solution on a simple problem.
Regarding the thing that gawk is slow... and reads the file into memory, I do not know this, but to me gawk seems to work with one line at the time and is very very fast (not that fast as some of the others, but the time to write and test also counts).
I process MB and even GB of data, and the only limit I found is line size.
Finds and replaces using allowing \n
sed -ie -z 's/Marker\n/# Marker Comment\nMarker\n/g' myfile.txt
Marker
Becomes
# Marker Comment
Marker
You could use xargs — it will replace \n with a space by default.
However, it would have problems if your input has any case of an unterminated quote, e.g. if the quote signs on a given line don't match.
On Mac OS X (using FreeBSD sed):
# replace each newline with a space
printf "a\nb\nc\nd\ne\nf" | sed -E -e :a -e '$!N; s/\n/ /g; ta'
printf "a\nb\nc\nd\ne\nf" | sed -E -e :a -e '$!N; s/\n/ /g' -e ta
To remove empty lines:
sed -n "s/^$//;t;p;"
Using Awk:
awk "BEGIN { o=\"\" } { o=o \" \" \$0 } END { print o; }"
A solution I particularly like is to append all the file in the hold space and replace all newlines at the end of file:
$ (echo foo; echo bar) | sed -n 'H;${x;s/\n//g;p;}'
foobar
However, someone said me the hold space can be finite in some sed implementations.
Replace newlines with any string, and replace the last newline too
The pure tr solutions can only replace with a single character, and the pure sed solutions don't replace the last newline of the input. The following solution fixes these problems, and seems to be safe for binary data (even with a UTF-8 locale):
printf '1\n2\n3\n' |
sed 's/%/%p/g;s/#/%a/g' | tr '\n' # | sed 's/#/<br>/g;s/%a/#/g;s/%p/%/g'
Result:
1<br>2<br>3<br>
It is sed that introduces the new-lines after "normal" substitution. First, it trims the new-line char, then it processes according to your instructions, then it introduces a new-line.
Using sed you can replace "the end" of a line (not the new-line char) after being trimmed, with a string of your choice, for each input line; but, sed will output different lines. For example, suppose you wanted to replace the "end of line" with "===" (more general than a replacing with a single space):
PROMPT~$ cat <<EOF |sed 's/$/===/g'
first line
second line
3rd line
EOF
first line===
second line===
3rd line===
PROMPT~$
To replace the new-line char with the string, you can, inefficiently though, use tr , as pointed before, to replace the newline-chars with a "special char" and then use sed to replace that special char with the string you want.
For example:
PROMPT~$ cat <<EOF | tr '\n' $'\x01'|sed -e 's/\x01/===/g'
first line
second line
3rd line
EOF
first line===second line===3rd line===PROMPT~$