Alternatives to grep/sed that treat new lines as just another character - sed

Both grep and sed handle input line-by-line and, as far as I know, getting either of them to handle multiple lines isn't very straightforward. What I'm looking for is an alternative or alternatives to these two programs that treat newlines as just another character. Is there any tool that fits such a criteria

The tool you want is awk. It is record-oriented, not line-oriented, and you can specify your record-separator by setting the builtin variable RS. In particular, GNU awk lets you set RS to any regular expression, not just a single character.

Here is an example where awk uses one blank line to separate every record. If you show us what data you have, we can help you with it.
cat file
first line
second line
third line
fourth line
fifth line
sixth line
seventh line
eight line
more data
Running awk on this and reconstruct data using blank line as new record.
awk -v RS= '{$1=$1}1' file
first line second line third line
fourth line fifth line sixth line
seventh line eight line
more data
PS RS is not equal to file, is set to RS= blank, equal to RS=""

1) Sed can handle a block lines together, not always line by line.
In sed, normally I use :loop; $!{N; b loop}; to get all the lines available in pattern space delimited by newline.
Sample:
Productivity
Google Search\
Tips
"Web Based Time Tracking,
Web Based Todo list and
Reduce Key Stores etc"
result (remove the content between ")
sed -e ':loop; $!{N; b loop}; s/\"[^\"]*\"//g' thegeekstuff.txt
Productivity
Google Search\
Tips
You should read this URL (Unix Sed Tutorial: 6 Examples for Sed Branching Operation), it will give you detail how it works.
http://www.thegeekstuff.com/2009/12/unix-sed-tutorial-6-examples-for-sed-branching-operation/
2) For grep, check if your grep support -z option, which needn't handle input line by line.
-z, --null-data
Treat the input as a set of lines, each terminated by a zero
byte (the ASCII NUL character) instead of a newline. Like the
-Z or --null option, this option can be used with commands like
sort -z to process arbitrary file names.

Related

I want to print the last line of group using sed

I have file which is shown below
Section1
George, 1998-1995
Peter, 1999-1990
Simon, 1988-1960
Section2
Gery, 2019-2015
John, 1984-1983
Thomson, 1978-1965
When i give Section1 Expected output is
Simon, 1988-1960
Like this i have lots of sections. I want to achieve this with sed not using awk.
I tried like this . But it has the line number hard coding. And also it is printing the complete range
sed -n '/Section1/,4{p}'
Here i could able to remove the hardcoding. But unable to print the last line. And also next section name also coming.
sed -n '/Section1/ , /Section./{p}'
This might work for you (GNU sed):
sed '$b;N;/\nSection/P;D' file
Make a moving window of two lines and print the first line if the second line is begins Section and always the last line.
For the last line of a specific section use:
sed -n '/^Section1/{:a;h;$!{n;/^\S/!ba};x;s/^\s*//p}' file
A gnu awk solution.
awk -v RS='Section' '$1=="1" {print $(NF-1),$NF}' file
Simon, 1988-1960
By setting Record Selector to Section, awk works in block. Then print the second latest and the latest field of block matching 1, since Section is stripped of.
You may consider using
sed -n '/^Section1$/,/^Section[0-9]*$/{:a;h;n;/^Section[0-9]*$/!ba;x;s/^[ \t]*//;p}' file > newfile
See the online demo.
Details
-n - the switch suppresses default line output mode
/^Section1$/,/^Section[0-9]*$/ - a block of lines between a line that is equal to Section1 and a line that fully matches a Section and any 0 or more digits pattern (the next {...} group of commands relates to the range matched with this)
:a - sets a label named a
h - copies the current line into hold buffer
n - discards the current pattern space value and reads the next line into it
/^Section[0-9]*$/!ba - if the pattern space value does not match the end block line go back to label a
x - else, once we get to the last line, the previous one is in hold space, so x is used to swap hold and pattern space
s/^[ \t]*// - remove initial whitespace
p - print the pattern space.
Regex:
(Section1)((\n.*,.*)*\n\s*)(?'lastLine'.*)
Test here.
I did not understand exactly what you want to do with the result, so I cannot tell you the exact sed command.

Matching patterns across lines

Suppose I have a file which contains:
something
line=1
file=2
other
lines
ignore
something
line=2
file=3
other
lines
ignore
Eventually, I want a unique list of the line and file combinations in each section. In the first stage I am trying to get sed to output just those lines combined into one line, like
line=1file=2
line=2file=3
Then I can use sort and uniq.
So I am trying
sed -n -r 's/(line=)(.*?)(\r)(file=)(.*?)(\r)/\1\2\4\5/p' sample.txt
(It isn't necessarily just a number after each)
But it won't match across the lines. I have tried \n and \r\n but it doesn't seem to be the style of new line, since:
sed -n -r 's/(line=)(.*?)(\r)/\1\2/p' sample.txt
Will output the "line=" lines, but I just can't get it to span the new line, and collect the second line as well.
By default, sed will operate only on chunks separated by \n character, so you can never match across multiple lines. Some sed implementations support -z option which will make it to operate on chunks separated by ASCII NUL character instead of newline character (this could work for small files, assuming NUL character won't affect the pattern you want to match)
There are also some sed commands that can be used for multiline processing
sed -n '/line=/{N;s/\n//p}'
N command will add the next line to current chunk being processed (which has to match line= in this case)
s/\n//p then delete the newline character, so that you get the output as single line
If your input has dos style line ending, first convert it to unix style (see Why does my tool output overwrite itself and how do I fix it?) or take care of \r as well
sed -n '/line=/{N;s/\r\n//p}'
Note that these commands were tested on GNU sed, syntax may vary for other implementations

Add any number of whitespaces to file

I have a plain text file:
line1_text
line2_text
I need to add a number of whitespaces between the two lines.
Adding 10 whitespaces is easy.
But say I need to add 10000 whitespaces, how would I achieve that using sed?
P.S. This is for experimental purposes
There undoubtedly is a sed method to do this but, since sed does not have any natural understanding of arithmetic, it is not a natural choice for this problem. By contrast, awk understands arithmetic and can readily, for example, print an empty line n times for any integer value of n.
As an example, consider this input file:
$ cat infile
line1_text
line2_text
This code will add as many blank lines as you like before any line that contains the string line2_text:
$ awk -v n=5 '/line2_text/{for (i=1;i<=n;i++)print""} 1' infile
line1_text
line2_text
If you want 10,000 blank lines instead of 5, then replace n=5 with n=10000.
How it works
-v n=5
This defines an awk variable n with value 5.
/line2_text/{for (i=1;i<=n;i++)print""}
Every time that a line matches the regex line2_text, then a for loop is performed with prints an empty line n times.
1
This is awk's shorthand for print-the-line and it causes every line from input to be printed to the output.
This might work for you (GNU sed):
sed -r '/line1_text/{x;s/.*/ /;:a;ta;s/ /&\n/10000;tb;s/^[^\n]*/&&/;ta;:b;s/\n.*//;x;G}' file
This appends the hold space to the first line. The hold space is manipulated to hold the required number of spaces by a looping mechanism based on powers of 2. This may produce more than necessary and the remainder are chopped off using a linefeed as a delimiter.
To change spaces to newlines, use:
sed -r '/line1_text/{x;s/.*/ /;:a;ta;s/ /\n&/10000;tb;s/^[^\n]*/&&/;ta;:b;s/\n.*//;s/ /\n/g;x;G}' file
In essence the same can be achieved using this (however it is very slow for large numbers):
sed -r '/line1_text/{x;:a;/ {20}/bb;s/^/ /;ta;:b;x;G}' file

Can I use the sed command to replace multiple empty line with one empty line?

I know there is a similar question in SO How can I replace mutliple empty lines with a single empty line in bash?. But my question is can this be implemented by just using the sed command?
Thanks
Give this a try:
sed '/^$/N;/^\n$/D' inputfile
Explanation:
/^$/N - match an empty line and append it to pattern space.
; - command delimiter, allows multiple commands on one line, can be used instead of separating commands into multiple -e clauses for versions of sed that support it.
/^\n$/D - if the pattern space contains only a newline in addition to the one at the end of the pattern space, in other words a sequence of more than one newline, then delete the first newline (more generally, the beginning of pattern space up to and including the first included newline)
You can do this by removing empty lines first and appending line space with G command:
sed '/^$/d;G' text.txt
Edit2: the above command will add empty lines between each paragraph, if this is not desired, you could do:
sed -n '1{/^$/p};{/./,/^$/p}'
Or, if you don't mind that all leading empty lines will be stripped, it may be written as:
sed -n '/./,/^$/p'
since the first expression just evaluates the first line, and prints it if it is blank.
Here: -n option suppresses pattern space auto-printing, /./,/^$/ defines the range between at least one character and none character (i.e. empty space between newlines) and p tells to print this range.

bash: filter away consecutive lines from text file

I want to delete from many files each instance of a paragraph. I call paragraph a sequence of lines.
For example:
my first line
my second line
my third line
the fourth
5th and last
the problem is that I only want to delete them when they appear as a group. For example, if my first line appears alone I don't want to delete it.
#OP, i see you accepted the answer whereby your paragraph sentences are "hardcorded", so i assume those paragraphs are always the same? its that's true, you can use grep. Store the paragraph you want to get rid of in a file eg "filter", then use -f and -v option of grep to do the job,
grep -v -f filter file
If you are able to use Perl, you can do it in one line like this:
perl -0777 -pe 's/my first line\nmy second line\nmy third line\nthe fourth\n5th and last\n//g' paragraph_file
the explanation is in perlrun:
The special value 00 will cause Perl to slurp files in paragraph mode. The value 0777 will cause Perl to slurp files whole because there is no legal byte with that value.
Sample input:
my first line
my second line
my third line
the fourth
5th and last
hey
my first line
my second line
my third line
the fourth
5th and last
hello
my first line
Output:
$ perl -0777 -pe 's/my first line\nmy second line\nmy third line
\nthe fourth\n5th and last\n//g' paragraph_file
hey
hello
my first line
You can do it with sed:
sed '$!N; /^\(.*\)\n\1$/!P; D' file_to_filter