How can i use sed to exclude patterns when joining lines together? - sed

I am trying to use sed to look for lines that start with '1' and join them with the following line, while ignoring lines that start with '1.'
my source file looks like this:
name cat
1
7.75
2
1.27
X
5.10
The desired output is:
name cat
1 7.75
2
1.27
X
5.10
I have a command that looks for lines that start with 1 and joins the following line, however because I also have lines with 1.* which i want to ignore. I have tried the following sed command and used to try and ignore decimals, however it does not work.
The command i am using is:
sed '/^\<1\>/N;s/\n/ /'
but it gives this output:
name cat
1 7.75
2
1.27 X
5.10
How can I join lines starting with '1' with the following line, while ignoring lines that start with 1.* ?
Edit:
I only want to join lines that contain '1' (nothing else on the line) with the following line
Some lines start with a float , eg 1.2 , i want to ignore these so the next line is not appended to this.

sed '/^1/{/^1\./!N;s/\n/ /}'
If a line does start with a 1, then if it does not start with 1. then append next line. Then replace the newline for a space.
Or just:
sed '/^1\([^\.]\|$\)/N;s/\n/ /'
# same without `\(\|\)`
sed '/^1[^\.]/N;/^1$/N;s/\n/ /'
If a line start with a 1 and then has anything else then a comma or it's the end of line, then append next line. Replace the newline for a space
I only want to join lines that contain '1' (nothing else on the line) with the following line
So just match the 1.
sed '/^1$/N;s/\n/ /'
Maybe you want to just match 1 followed by any whitespace?
sed '/^1[[:space:]]*$/N;s/\n/ /'
Or by spaces only?
sed '/^1 *$/N;s/\n/ /'
The Sed - An Introduction and Tutorial by Bruce Barnett is a great place to learn how to use sed. To learn regexes, I recommend playing with regex crosswords, they let you learn regexes fast and with fun.

You can begin your sed script by starting a new cycle when a line begins with 1.:
#!/bin/sed -f
/^1\./n # don't change 1.x
/^1\b/N # \b is GNU sed word-boundary
s/\n/ /
Thus, only lines not beginning 1. get the following line appended.
Example output:
name cat
1 7.75
2
1.27
X
5.10
According to later comments on the question, it seems you only want to join lines containing 1 and optional trailing spaces, which makes the script much simpler:
#!/bin/sed -f
/^1[[:space:]]*$/N # match the whole line
y/\n/ /

Could you please try following.
awk '
$0==1{
prev=$0
next
}
prev{
$0=prev OFS $0
prev=""
}
1
END{
if(prev){
print prev
}
}
' Input_file
Explanation: Adding detailed explanation for above code.
awk ' ##Starting awk program from here.
$0==1{ ##Checking condition if line is having 1 value then do following.
prev=$0 ##Creating variable prev and set its value to current line value.
next ##next will skip all further statements from here.
}
prev{ ##Checking condition if prev variable is NOT NULL then do following.
$0=prev OFS $0 ##Setting current line value to prev OFS and current line.
prev="" ##Nullifing variable prev here.
}
1 ##1 will print the edited/non-edited line here.
END{ ##Starting END block for this awk program here.
if(prev){ ##Checking condition if variable prev is NOT NULL then do following.
print prev ##Printing variable prev here.
} ##Closing BLOCK for if condition here.
} ##Closing END block of this awk program here.
' Input_file ##Mentioning Input_file here.
Output will be as follows.
1 7.75
2
1.27
X
5.10

Using any awk in any shell on every UNIX box:
$ awk '{printf "%s%s", $0, ($1==1"" ? OFS : ORS)}' file
name cat
1 7.75
2
1.27
X
5.10
FYI some (all?) of the sed solutions posted so far are relying on non-POSIX functionality and so YMMV depending on what they do depending on which sed you use.

Related

GREP Print Blank Lines For Non-Matches

I want to extract strings between two patterns with GREP, but when no match is found, I would like to print a blank line instead.
Input
This is very new
This is quite old
This is not so new
Desired Output
is very
is not so
I've attempted:
grep -o -P '(?<=This).*?(?=new)'
But this does not preserve the second blank line in the above example. Have searched for over an hour, tried a few things but nothing's worked out.
Will happily used a solution in SED if that's easier!
You can use
#!/bin/bash
s='This is very new
This is quite old
This is not so new'
sed -En 's/.*This(.*)new.*|.*/\1/p' <<< "$s"
See the online demo yielding
is very
is not so
Details:
E - enables POSIX ERE regex syntax
n - suppresses default line output
s/.*This(.*)new.*|.*/\1/ - finds any text, This, any text (captured into Group 1, \1, and then any text again, or the whole string (in sed, line), and replaces with Group 1 value.
p - prints the result of the substitution.
And this is what you need for your actual data:
sed -En 's/.*"user_ip":"([^"]*).*|.*/\1/p'
See this online demo. The [^"]* matches zero or more chars other than a " char.
With your shown samples, please try following awk code.
awk -F'This\\s+|\\s+new' 'NF==3{print $2;next} NF!=3{print ""}' Input_file
OR
awk -F'This\\s+|\\s+new' 'NF==3{print $2;next} {print ""}' Input_file
Explanation: Simple explanation would be, setting This\\s+ OR \\s+new as field separators for all the lines of Input_file. Then in main program checking condition if NF(number of fields) are 3 then print 2nd field (where next will take cursor to next line). In another condition checking if NF(number of fields) is NOT equal to 3 then simply print a blank line.
sed:
sed -E '
/This.*new/! s/.*//
s/.*This(.*)new.*/\1/
' file
first line: lines not matching "This.*new", remove all characters leaving a blank line
second lnie: lines matching the pattern, keep only the "middle" text
this is not the pcre non-greedy match: the line
This is new but that is not new
will produce the output
is new but that is not
To continue to use PCRE, use perl:
perl -lpe '$_ = /This(.*?)new/ ? $1 : ""' file
This might work for you:
sed -E 's/.*This(.*)new.*|.*/\1/' file
If the first match is made, the line is replace by everything between This and new.
Otherwise the second match will remove everything.
N.B. The substitution will always match one of the conditions. The solution was suggested by Wiktor Stribiżew.

How to remove empty lines to one empty line between sentences in text files?

I have a text file with many empty lines between sentences. I used sed, gawk, grep but they dont work. :(. How can I do now? Thanks.
Myfile: Desired file:
a a
b b
c c
. .
d d
e e
f f
g g
. .
h
i
h j
i k
j .
k
.
You can use awk for this:
awk 'BEGIN{prev="x"}
/^$/ {if (prev==""){next}}
{prev=$0;print}' inputFile
or the compressed one liner:
awk 'BEGIN{p="x"}/^$/{if(p==""){next}}{p=$0;print}' inFl
This is a simple state machine that collapses multi-blank-lines into a single one.
The basic idea is this. First, set the previous line to be non-empty.
Then, for every line in the file, if it and the previous one are blank, just throw it away.
Otherwise, set the previous line to that value, print the line, and carry on.
Sample transcript, the following command:
$ echo '1
2
3
4
5
6
7
8
9
10' | awk 'BEGIN{p="x"}/^$/{if(p==""){next}}{p=$0;print}'
outputs:
1
2
3
4
5
6
7
8
9
10
Keep in mind that this is for truly blank lines (no content). If you're trying to collapse lines that have an arbitrary number of spaces or tabs, that will be a little trickier.
In that case, you could pipe the file through something like:
sed 's/^\s*$//'
to ensure lines with just whitespace become truly empty.
In other words, something like:
sed 's/^\s*$//' infile | awk 'my previous awk command'
To suppress repeated empty output lines with GNU cat:
cat -s file1 > file2
Here's one way using sed:
sed ':a; N; $!ba; s/\n\n\+/\n\n/g' file
Otherwise, if you don't mind a trailing blank line, all you need is:
awk '1' RS= ORS="\n\n" file
The Perl solution is even shorter:
perl -00 -pe '' file
You could do like this also,
awk -v RS="\0" '{gsub(/\n\n+/,"\n\n");}1' file
Explanation:
RS="\0" Once we set the null character as Record Seperator value, awk will read the whole file as single record.
gsub(/\n\n+/,"\n\n"); this replaces one or more blank lines with a single blank line. Note that \n\n regex matches a blank line along with the previous line's new line character.
Here is an other awk
awk -v p=1 'p=="" {p=1;next} 1; {p=$0}' file

sed: replace pattern only if followed by empty line

I need to replace a pattern in a file, only if it is followed by an empty line. Suppose I have following file:
test
test
test
...
the following command would replace all occurrences of test with xxx
cat file | sed 's/test/xxx/g'
but I need to only replace test if next line is empty. I have tried matching a hex code, but that doesn ot work:
cat file | sed 's/test\x0a/xxx/g'
The desired output should look like this:
test
xxx
xxx
...
Suggested solutions for sed, perl and awk:
sed
sed -rn '1h;1!H;${g;s/test([^\n]*\n\n)/xxx\1/g;p;}' file
I got the idea from sed multiline search and replace. Basically slurp the entire file into sed's hold space and do global replacement on the whole chunk at once.
perl
$ perl -00 -pe 's/test(?=[^\n]*\n\n)$/xxx/m' file
-00 triggers paragraph mode which makes perl read chunks separated by one or several empty lines (just what OP is looking for). Positive look ahead (?=) to anchor substitution to the last line of the chunk.
Caveat: -00 will squash multiple empty lines into single empty lines.
awk
$ awk 'NR==1 {l=$0; next}
/^$/ {gsub(/test/,"xxx", l)}
{print l; l=$0}
END {print l}' file
Basically store previous line in l, substitute pattern in l if current line is empty. Print l. Finally print the very last line.
Output in all three cases
test
xxx
xxx
...
This might work for you (GNU sed):
sed -r '$!N;s/test(\n\s*)$/xxx\1/;P;D' file
Keep a window of 2 lines throughout the length of the file and if the second line is empty and the first line contains the pattern then make a substitution.
Using sed
sed -r ':a;$!{N;ba};s/test([^\n]*\n(\n|$))/xxx\1/g'
explanation
:a # set label a
$ !{ # if not end of file
N # Add a newline to the pattern space, then append the next line of input to the pattern space
b a # Unconditionally branch to label. The label may be omitted, in which case the next cycle is started.
}
# simply, above command :a;$!{N;ba} is used to read the whole file into pattern.
s/test([^\n]*\n(\n|$))/xxx\1/g # replace the key word if next line is empty (\n\n) or end of line ($)

sed: joining lines depending on the second one

I have a file that, occasionally, has split lines. The split is signaled by the fact that the line starts with '+' (possibly preceeded by spaces).
line 1
line 2
+ continue 2
line 3
...
I'd like join the split line back:
line 1
line 2 continue 2
line 3
...
using sed. I'm not clear how to join a line with the preceeding one.
Any suggestion?
This might work for you:
sed 'N;s/\n\s*+//;P;D' file
These are actually four commands:
N
Append line from the input file to the pattern space
s/\n\s*+//
Remove newline, following whitespace and the plus
P
print line from the pattern space until the first newline
D
delete line from the pattern space until the first newline, e.g. the part which was just printed
The relevant manual page parts are
Selecting lines by numbers
Addresses overview
Multiline techniques - using D,G,H,N,P to process multiple lines
Doing this in sed is certainly a good exercise, but it's pretty trivial in perl:
perl -0777 -pe 's/\n\s*\+//g' input
I'm not partial to sed so this was a nice challenge for me.
sed -n '1{h;n};/^ *+ */{s// /;H;n};{x;s/\n//g;p};${x;p}'
In awk this is approximately:
awk '
NR == 1 {hold = $0; next}
/^ *\+/ {$1 = ""; hold=hold $0; next}
{print hold; hold = $0}
END {if (hold) print hold}
'
If the last line is a "+" line, the sed version will print a trailing blank line. Couldn't figure out how to suppress it.
You can use Vim in Ex mode:
ex -sc g/+/-j -cx file
g global search
- select previous line
j join with next line
x save and close
Different use of hold space with POSIX sed... to load the entire file into the hold space before merging lines.
sed -n '1x;1!H;${g;s/\n\s*+//g;p}'
1x on the first line, swap the line into the empty hold space
1!H on non-first lines, append to the hold space
$ on the last line:
g get the hold space (the entire file)
s/\n\s*+//g replace newlines preceeding +
p print everything
Input:
line 1
line 2
+ continue 2
+ continue 2 even more
line 3
+ continued
becomes
line 1
line 2 continue 2 continue 2 even more
line 3 continued
This (or potong's answer) might be more interesting than a sed -z implementation if other commands were desired for other manipulations of the data you can simply stick them in before 1!H, while sed -z is immediately loading the entire file into the pattern space. That means you aren't manipulating single lines at any point. Same for perl -0777.
In other words, if you want to also eliminate comment lines starting with *, add in /^\s*\*/d to delete the line
sed -n '1x;/^\s*\*/d;1!H;${g;s/\n\s*+//g;p}'
versus:
sed -z 's/\n\s*+//g;s/\n\s*\*[^\n]*\n/\n/g'
The former's accumulation in the hold space line by line keeps you in classic sed line processing land, while the latter's sed -z dumps you into what could be some painful substring regexes.
But that's sort of an edge case, and you could always just pipe sed -z back into sed. So +1 for that.
Footnote for internet searches: This is SPICE netlist syntax.
A solution for versions of sed that can read NUL separated data, like here GNU Sed's -z:
sed -z 's/\n\s*+//g'
Compared to potong's solution this has the advantage of being able to join multiple lines that start with +. For example:
line 1
line 2
+ continue 2
+ continue 2 even more
line 3
becomes
line 1
line 2 continue 2 continue 2 even more
line 3

join 2 consecutive rows under condition

I have 5 lines like:
typeA;pointA1
typeA;pointA2
typeA;pointA3
typeB;pointB1
typeB;pointB2
result output would be:
typeA;pointA1;typeA;pointA2
typeA;pointA2;typeA;pointA3
typeB;pointB1;typeB;pointB2
Is it possible to use sed or awk for this purpose?
This is easy with awk:
awk -F';' '$1 == prevType { printf("%s;%s;%s\n", $1, prevPoint, $0) } { prevType = $1; prevPoint = $2 }'
I've assumed that the blank lines between the records are not part of the input; if they are, just run the input through grep -v '^$' before awk.
paste could be useful in this case. it could save a lot of codes:
sed '1d' file|paste -d";" file -|awk -F';' '$1==$3'
see the test below
kent$ cat a
typeA;pointA1
typeA;pointA2
typeA;pointA3
typeB;pointB1
typeB;pointB2
kent$ sed '1d' a|paste -d";" a -|awk -F';' '$1==$3'
typeA;pointA1;typeA;pointA2
typeA;pointA2;typeA;pointA3
typeB;pointB1;typeB;pointB2
This GNU sed solution might work for you:
sed -rn '1{h;b};H;x;/^([^;]*);.*\n\1/!{s/.*\n//;x;d};s/\n/;/p' source_file
Assumes no blank lines else pipe preformat the source file with sed '/^$/d' source_file
EDIT:
On reflection the above solution is far too elaborate and can be condensed to:
sed -ne '1{h;b};H;x;/^\([^;]*\);.*\1/s/\n/;/p' source_file
Explanation:
The -n prevents any lines being implicitly printed. The first line is copied to the hold space (HS an extra register) and then a break is made that ends the iteration. All subsequent lines are appended to the HS. The HS is then swapped with the pattern space (PS - a register holding the current line). The HS at this point contains the previous and current lines which are now checked to see if the first field in each line are identical. If so, the newline separating the two lines is replaced by a ; and providing the substitution occurred the PS is printed out. The next iteration now takes place, the current line refreshes the PS and HS now holds the previous line.