replace two newlines to one in shell command line - perl

There are lot of questions about replacing multi-newlines to one newline but no one is working for me.
I have a file:
first line
second line MARKER
third line MARKER
other lines
many other lines
I need to replace two newlines (if they exist) after MARKER to one newline. A result file should be:
first line
second line MARKER
third line MARKER
other lines
many other lines
I tried sed ':a;N;$!ba;s/MARKER\n\n/MARKER\n/g' Fail.
sed is useful for single line replacements but has problems with newlines. It can't find \n\n
I tried perl -i -p -e 's/MARKER\n\n/MARKER\n/g' Fail.
This solution looks closer, but it seems that regexp didn't reacts to \n\n.
Is it possible to replace \n\n only after MARKER and not to replace other \n\n in the file?
I am interested in one-line-solution, not scripts.

I think you were on the right track. In a multi-line program, you would load the entire file into a single scalar and run this substitution on it:
s/MARKER\n\n/MARKER\n/g
The trick to getting a one-liner to load a file into a multi-line string is to set $/ in a BEGIN block. This code will get executed once, before the input is read.
perl -i -pe 'BEGIN{$/=undef} s/MARKER\n\n/MARKER\n/g' input

Your Perl solution doesn't work because you are search for lines that contain two newlines. There is no such thing. Here's one solution:
perl -ne'print if !$m || !/^$/; $m = /MARKER$/;' infile > outfile
Or in-place:
perl -i~ -ne'print if !$m || !/^$/; $m = /MARKER$/;' file
If you're ok with loading the entire file into memory, you can use
perl -0777pe's/MARKER\n\n/MARKER\n/g;' infile > outfile
or
perl -0777pe's/MARKER\n\K\n//g;' infile > outfile
As above, you can use -i~ do edit in-place. Remove the ~ if you don't want to make a backup.

awk:
kent$ cat a
first line
second line MARKER
third line MARKER
other lines
many other lines
kent$ awk 'BEGIN{RS="\x034"} {gsub(/MARKER\n\n/,"MARKER\n");printf $0}' a
first line
second line MARKER
third line MARKER
other lines
many other lines

See sed one liners.

awk '
marker { marker = 0; if (/^$/) next }
/MARKER/ { marker = 1 }
{ print }
'

This can be done in very simple sed.
sed '/MARKER$/{n;/./!d}'

This might work for you:
sed '/MARKER/,//{//!d}'
Explanation:
Deletes all lines between MARKER's preserving the MARKER lines.
Or:
sed '/MARKER/{n;N;//D}'
Explanation:
Read the next line after MARKER, then append the line after that. Delete the previous line if the current line is a MARKER line.

Related

Can't replace '\n' with '\\' for whatever reason

I have a whole bunch of files, and I wish to change something like this:
My line of text
My other line of text
Into
My line of text\\
My other line of text
Seems simple, but somehow it isn't. I have tried sed s,"\n\n","\\\\\n", as well as tr '\n' '\\' and about 20 other incarnations of these commands.
There must be something going on which I don't understand... but I'm completely lost as to why nothing is working. I've had some comical things happening too, like when cat'ing out the file, it doesn't print newlines, only writes over where the rest was written.
Does anyone know how to accomplish this?
sed works on lines. It fetches a line, applies your code to it, fetches the next line, and so forth. Since lines are treated individually, multiline regexes don't work quite so easily.
In order to use multiline regexes with sed, you have to first assemble the file in the pattern space and then work on it:
sed ':a $!{ N; ba }; s/\n\n/\\\\\n/g' filename
The trick here is the
:a $!{ N; ba }
This works as follows:
:a # jump label for looping
$!{ # if the end of the input has not been reached
N # fetch the next line and append it to what we already have
ba # go to :a
}
Once this is over, the whole file is in the pattern space, and multiline regexes can be applied to it. Of course, this requires that the file is small enough to fit into memory.
sed is line-oriented and so is inappropriate to try to use on problems that span lines. You just need to use a record-oriented tool like awk:
$ awk -v RS='^$' -v ORS= '{gsub(/\n\n/,"\\\\\n")}1' file
My line of text\\
My other line of text
The above uses GNU awk for multi-char RS.
Here is an awk that solve this:
If the the blank lines could contains tabs or spaces, user this:
awk '!NF{a=a"//"} b{print a} {a=$0;b=NF} END {print a}' file
My line of text//
My other line of text
If blank line is just blank with nothing, this should do:
awk '!NF{a=a"//"} a!=""{print a} {a=$0} END {print a}' file
This might work for you (GNU sed):
sed 'N;s|\n$|//|;P;D' file
This keeps 2 lines in the pattern space at any point in time and replaces an empty line by a double slash.

sed: replace pattern only if followed by empty line

I need to replace a pattern in a file, only if it is followed by an empty line. Suppose I have following file:
test
test
test
...
the following command would replace all occurrences of test with xxx
cat file | sed 's/test/xxx/g'
but I need to only replace test if next line is empty. I have tried matching a hex code, but that doesn ot work:
cat file | sed 's/test\x0a/xxx/g'
The desired output should look like this:
test
xxx
xxx
...
Suggested solutions for sed, perl and awk:
sed
sed -rn '1h;1!H;${g;s/test([^\n]*\n\n)/xxx\1/g;p;}' file
I got the idea from sed multiline search and replace. Basically slurp the entire file into sed's hold space and do global replacement on the whole chunk at once.
perl
$ perl -00 -pe 's/test(?=[^\n]*\n\n)$/xxx/m' file
-00 triggers paragraph mode which makes perl read chunks separated by one or several empty lines (just what OP is looking for). Positive look ahead (?=) to anchor substitution to the last line of the chunk.
Caveat: -00 will squash multiple empty lines into single empty lines.
awk
$ awk 'NR==1 {l=$0; next}
/^$/ {gsub(/test/,"xxx", l)}
{print l; l=$0}
END {print l}' file
Basically store previous line in l, substitute pattern in l if current line is empty. Print l. Finally print the very last line.
Output in all three cases
test
xxx
xxx
...
This might work for you (GNU sed):
sed -r '$!N;s/test(\n\s*)$/xxx\1/;P;D' file
Keep a window of 2 lines throughout the length of the file and if the second line is empty and the first line contains the pattern then make a substitution.
Using sed
sed -r ':a;$!{N;ba};s/test([^\n]*\n(\n|$))/xxx\1/g'
explanation
:a # set label a
$ !{ # if not end of file
N # Add a newline to the pattern space, then append the next line of input to the pattern space
b a # Unconditionally branch to label. The label may be omitted, in which case the next cycle is started.
}
# simply, above command :a;$!{N;ba} is used to read the whole file into pattern.
s/test([^\n]*\n(\n|$))/xxx\1/g # replace the key word if next line is empty (\n\n) or end of line ($)

multiline pattern delete with single line command

I would like to delete all empty segments in my file.
The empty segment can be specified by a pair of consecutive lines starting with START and ending with END. Valid segments will have some contents between lines starting with START and ending with END
Sample Input
Header
START arguments
END
Any contents
START arguments
...
something
...
END
Footer
Desired Output
Header
Any contents
START arguments
...
something
...
END
Footer
Here I'm looking for possible one liners. Any help would be appreciated.
Trials
I tried following awk. It works to some extent but it deletes START lines even in valid segments.
awk '/^START/ && getline && /^END$/ {next} 1' file
perl -00 -pe 's/START .*?\nEND//g' file
this is a better one.
the solution I gave earlier will discard whole paragraph if they are not separated by blank lines.
Earlier response below:
how about this perl one liner ?
perl -00 -ne 'print if not /START .*\nEND/' file
read-in file in paragraph mode and discard lines matching START <string><newline>END
Meanwhile people are suggesting nice solutions, I came up with alternative solution using sed
sed '/^START/N;/^START.*END$/d' file
Or as suggested by #jthill
sed '/^START/N; /\nEND$/d' file
gawk only
awk -v RS='START[^\n]*\nEND\n' '{printf "%s", $0}' file.txt
Perhaps the following will be helpful:
perl -ne 'print /^START/?do{$x=<>;$_,$x if $x!~/^END/}:$_' inFile
Output on your dataset:
Header
Any contents
START arguments
...
something
...
END
Footer
$ awk '{rec = rec $0 RS} END{ gsub(/START[^\n]*\nEND\n/,"",rec); printf "%s", rec }' file
Header
Any contents
START arguments
...
something
...
END
Footer
/^START/ {
startline=$0
next
}
/^END$/ && startline {
startline=""
next
}
startline {
print startline
}
startline=""
1

put all separate paragraphs of a file into a separate line

I have a file that contains sequence data, where each new paragraph (separated by two blank lines) contain a new sequence:
#example
ASDHJDJJDMFFMF
AKAKJSJSJSL---
SMSM-....SKSKK
....SK
SKJHDDSNLDJSCC
AK..SJSJSL--HG
AHSM---..SKSKK
-.-GHH
and I want to end up with a file looking like:
ASDHJDJJDMFFMFAKAKJSJSJSL---SMSM-....SKSKK....SK
SKJHDDSNLDJSCCAK..SJSJSL--HGAHSM---..SKSKK-.-GHH
each sequence is the same length (if that helps).
I would also be looking to do this over multiple files stored in different directiories.
I have just tried
sed -e '/./{H;$!d;}' -e 'x;/regex/!d' ./text.txt
however this just deleted the entire file :S
any help would bre appreciated - doesn't have to be in sed, if you know how to do it in perl or something else then that's also great.
Thanks.
All you're asking to do is convert a file of blank-lines-separated records (RS) where each field is separated by newlines into a file of newline-separated records where each field is separated by nothing (OFS). Just set the appropriate awk variables and recompile the record:
$ awk '{$1=$1}1' RS= OFS= file
ASDHJDJJDMFFMFAKAKJSJSJSL---SMSM-....SKSKK....SK
SKJHDDSNLDJSCCAK..SJSJSL--HGAHSM---..SKSKK-.-GHH
awk '
/^[[:space:]]*$/ {if (line) print line; line=""; next}
{line=line $0}
END {if (line) print line}
'
perl -00 -pe 's/\n//g; $_.="\n"'
For multiple files:
# adjust your glob pattern to suit,
# don't be shy to ask for assistance
for file in */*.txt; do
newfile="/some/directory/$(basename "$file")"
perl -00 -pe 's/\n//g; $_.="\n"' "$file" > "$newfile"
done
A Perl one-liner, if you prefer:
perl -nle 'BEGIN{$/=""};s/\n//g;print $_' file
The $/ variable is the equivalent of awk's RS variable. When set to the empty sting ("") it causes two or more empty lines to be treated as one empty line. This is the so-called "paragraph-mode" of reading. For each record read, all newline characters are removed. The -l switch adds a newline to the end of each output string, thus giving the desired result.
just try to find those double linebreaks: \n or \r and replace first those with an special sign like :$:
after that you replace every linebreak with an empty string to get the whole file in one line.
next, replace your special sign with a simple line break :)

Perl from command line: When replace a string in a file it removes also the new lines

I'm using perl from command line to to replace duplicate spaces from a text file.
The command I use is:
perl -pi -e 's/\s+/ /g' file.csv
The problem: This procedure removes also the new lines in the resulting file....
Any idea why this occur?
Thanks!
\s means the five characters: [ \f\n\r\t]. So, you're replacing newlines by single spaces.
In your case, the simplest way is to enable automatic line-ending processing with -l flag:
perl -pi -le 's/\s+/ /g' file.csv
This way, newlines will be chomped before -e statement and appended after.
Will add my two cents to the previous answer.
If you use this regexp in perl script itself, then you can just change it to:
s/[ ]+/ /gis;
That will change every line and won't delete line-endings.