linux + cut file until specific word - perl

I have the file with the following format
How to cut the file until the line that start with number 2 ( not include line 2 )
before the new line with number 2 could be spaces or TABs ,
remark - implementation can be done with ksh or awk or sed or perl one liner etc
file:
* 0
Any text
Any text
.
.
1
Any text
Any text
.
.
2
Any text
Any text
.
.
3
Any text
Any text
.
.

Just exit when you encounter the line you want to stop at:
awk '/[[:space:]]*2/{exit}1' file
Update: [[:space:]] will take care of spaces, tabs etc.

Use sed to delete everything after (and including) the matching line
$ sed '/^[ ]*2/,$d' input.txt
That's a space and a tab in the character class.

You can "play" with a flag, that deactivates when the line is found:
awk 'BEGIN{f=1} /^2/{f=0} f' file
BEGIN{f=1} initializes the flag as true. /^2/{f=0} unsets it when a line starts with 2, f, when true, prints the line.
To also check lines having 2 after some spaces, you can do:
awk 'BEGIN{f=1} /\s*2/{f=0} f' file

Perl one-liner:
perl -pwe 'exit if $_ =~ /^\s*2/' file
This allows for any number of spaces between the start of the line and the number 2

Use the instruction Q with sed so that it doesn't parse the rest of the file once it has found the appropriate end line:
sed '/^\s*2/Q' file

Related

GREP Print Blank Lines For Non-Matches

I want to extract strings between two patterns with GREP, but when no match is found, I would like to print a blank line instead.
Input
This is very new
This is quite old
This is not so new
Desired Output
is very
is not so
I've attempted:
grep -o -P '(?<=This).*?(?=new)'
But this does not preserve the second blank line in the above example. Have searched for over an hour, tried a few things but nothing's worked out.
Will happily used a solution in SED if that's easier!
You can use
#!/bin/bash
s='This is very new
This is quite old
This is not so new'
sed -En 's/.*This(.*)new.*|.*/\1/p' <<< "$s"
See the online demo yielding
is very
is not so
Details:
E - enables POSIX ERE regex syntax
n - suppresses default line output
s/.*This(.*)new.*|.*/\1/ - finds any text, This, any text (captured into Group 1, \1, and then any text again, or the whole string (in sed, line), and replaces with Group 1 value.
p - prints the result of the substitution.
And this is what you need for your actual data:
sed -En 's/.*"user_ip":"([^"]*).*|.*/\1/p'
See this online demo. The [^"]* matches zero or more chars other than a " char.
With your shown samples, please try following awk code.
awk -F'This\\s+|\\s+new' 'NF==3{print $2;next} NF!=3{print ""}' Input_file
OR
awk -F'This\\s+|\\s+new' 'NF==3{print $2;next} {print ""}' Input_file
Explanation: Simple explanation would be, setting This\\s+ OR \\s+new as field separators for all the lines of Input_file. Then in main program checking condition if NF(number of fields) are 3 then print 2nd field (where next will take cursor to next line). In another condition checking if NF(number of fields) is NOT equal to 3 then simply print a blank line.
sed:
sed -E '
/This.*new/! s/.*//
s/.*This(.*)new.*/\1/
' file
first line: lines not matching "This.*new", remove all characters leaving a blank line
second lnie: lines matching the pattern, keep only the "middle" text
this is not the pcre non-greedy match: the line
This is new but that is not new
will produce the output
is new but that is not
To continue to use PCRE, use perl:
perl -lpe '$_ = /This(.*?)new/ ? $1 : ""' file
This might work for you:
sed -E 's/.*This(.*)new.*|.*/\1/' file
If the first match is made, the line is replace by everything between This and new.
Otherwise the second match will remove everything.
N.B. The substitution will always match one of the conditions. The solution was suggested by Wiktor Stribiżew.

sed/awk conditionally delete lines from the start and end of a file

I have several thousand text files which might start with
"
Start of text
but not all of them have the same number of line breaks and not all of them have "
I would like to remove " (if it exists) and any line breaks, if any.
(and the ending too but I'll probably figure it out if you show me how to remove it from the start)
End of file...
"
perl is also ok
my attempt would be something like this with fish shell. awk is probably more performant though
if head -1 | grep \"
sed -i 1d $file
if head -1 | grep '^\r\n$'
sed -i 1d $file
if head -1 | grep '^\r\n$'
sed -i 1d $file
if head -1 | grep '^\r\n$'
sed -i 1d $file
that might actually work I'm going to try it
The simplest way to do this is a 2-pass approach where on the first pass you figure out the beginning and ending line numbers for the "good" lines and on the second you print the lines between those numbers:
awk '
NR==FNR { if (NF && !/^"$/) { if (!beg) beg=NR; end=NR } next }
(beg <= FNR) && (FNR <= end)
' file file
For example given this input:
$ cat file
"
Start of text
but not all of them have the same number of line breaks and not all of them have "
I would like to remove " (if it exists) and any line breaks, if any.
(and the ending too but I'll probably figure it out if you show me how to remove it from the start)
End of file...
"
We can do the following using any awk in any shell on every UNIX box:
$ awk 'NR==FNR{if (NF && !/^"$/) {if (!beg) beg=NR; end=NR} next} (beg <= FNR) && (FNR <= end)' file file
Start of text
but not all of them have the same number of line breaks and not all of them have "
I would like to remove " (if it exists) and any line breaks, if any.
(and the ending too but I'll probably figure it out if you show me how to remove it from the start)
End of file...
You can use ed to do it in a single pass, too:
Something like
printf '%s\n' '1g/^"$/.,/^./-1d' '$g/^"$/?^.?+1,$d' w | ed -s "$file"
Translated: If the first line is nothing but a quote, delete it and any following empty lines. If the last line is nothing but a quote, delete all preceding empty lines and it. Finally write the file back to disk.
This might work for you (GNU sed):
sed '1{/^"$/d};/\S/!d;:a;${/^"$/Md};/\S/{n;ba};$d;N;ba' file
Delete the first line if contains a single ".
Delete all empty lines from the start of the file.
Form a loop for the remainder of the file.
Delete the last line(s) if it/they contains a single ".
If the current line(s) is/are not empty, print it/them, fetch the next and repeat.
If the current line(s) is/are the last and empty, delete it/them.
The current line(s) is/are empty so append the next line and repeat.
N.B. This is a single pass solution and allows for empty lines within the body of the file.
Alternative, memory intensive:
sed -Ez 's/^"?\n+//;s/\n+("\n)?$/\n/' file
In addition to the two-pass processing, here's a one-pass:
awk '!/^"*$/{print b $0;f=1;b=""} f&&/^"*$/{b=b $0 ORS}' file
The program consists of two small parts:
Whenever there's content (lines that contain more than "), print possibly buffered lines and the current input line, set a flag that content has started, and clear the buffer.
If content had started (f), but the current line doesn't contain any content, we may have reached the end, so we buffer these empty lines. Later, (1) will print them or they will be discarded on EOF.

How to remove empty lines to one empty line between sentences in text files?

I have a text file with many empty lines between sentences. I used sed, gawk, grep but they dont work. :(. How can I do now? Thanks.
Myfile: Desired file:
a a
b b
c c
. .
d d
e e
f f
g g
. .
h
i
h j
i k
j .
k
.
You can use awk for this:
awk 'BEGIN{prev="x"}
/^$/ {if (prev==""){next}}
{prev=$0;print}' inputFile
or the compressed one liner:
awk 'BEGIN{p="x"}/^$/{if(p==""){next}}{p=$0;print}' inFl
This is a simple state machine that collapses multi-blank-lines into a single one.
The basic idea is this. First, set the previous line to be non-empty.
Then, for every line in the file, if it and the previous one are blank, just throw it away.
Otherwise, set the previous line to that value, print the line, and carry on.
Sample transcript, the following command:
$ echo '1
2
3
4
5
6
7
8
9
10' | awk 'BEGIN{p="x"}/^$/{if(p==""){next}}{p=$0;print}'
outputs:
1
2
3
4
5
6
7
8
9
10
Keep in mind that this is for truly blank lines (no content). If you're trying to collapse lines that have an arbitrary number of spaces or tabs, that will be a little trickier.
In that case, you could pipe the file through something like:
sed 's/^\s*$//'
to ensure lines with just whitespace become truly empty.
In other words, something like:
sed 's/^\s*$//' infile | awk 'my previous awk command'
To suppress repeated empty output lines with GNU cat:
cat -s file1 > file2
Here's one way using sed:
sed ':a; N; $!ba; s/\n\n\+/\n\n/g' file
Otherwise, if you don't mind a trailing blank line, all you need is:
awk '1' RS= ORS="\n\n" file
The Perl solution is even shorter:
perl -00 -pe '' file
You could do like this also,
awk -v RS="\0" '{gsub(/\n\n+/,"\n\n");}1' file
Explanation:
RS="\0" Once we set the null character as Record Seperator value, awk will read the whole file as single record.
gsub(/\n\n+/,"\n\n"); this replaces one or more blank lines with a single blank line. Note that \n\n regex matches a blank line along with the previous line's new line character.
Here is an other awk
awk -v p=1 'p=="" {p=1;next} 1; {p=$0}' file

sed: replace pattern only if followed by empty line

I need to replace a pattern in a file, only if it is followed by an empty line. Suppose I have following file:
test
test
test
...
the following command would replace all occurrences of test with xxx
cat file | sed 's/test/xxx/g'
but I need to only replace test if next line is empty. I have tried matching a hex code, but that doesn ot work:
cat file | sed 's/test\x0a/xxx/g'
The desired output should look like this:
test
xxx
xxx
...
Suggested solutions for sed, perl and awk:
sed
sed -rn '1h;1!H;${g;s/test([^\n]*\n\n)/xxx\1/g;p;}' file
I got the idea from sed multiline search and replace. Basically slurp the entire file into sed's hold space and do global replacement on the whole chunk at once.
perl
$ perl -00 -pe 's/test(?=[^\n]*\n\n)$/xxx/m' file
-00 triggers paragraph mode which makes perl read chunks separated by one or several empty lines (just what OP is looking for). Positive look ahead (?=) to anchor substitution to the last line of the chunk.
Caveat: -00 will squash multiple empty lines into single empty lines.
awk
$ awk 'NR==1 {l=$0; next}
/^$/ {gsub(/test/,"xxx", l)}
{print l; l=$0}
END {print l}' file
Basically store previous line in l, substitute pattern in l if current line is empty. Print l. Finally print the very last line.
Output in all three cases
test
xxx
xxx
...
This might work for you (GNU sed):
sed -r '$!N;s/test(\n\s*)$/xxx\1/;P;D' file
Keep a window of 2 lines throughout the length of the file and if the second line is empty and the first line contains the pattern then make a substitution.
Using sed
sed -r ':a;$!{N;ba};s/test([^\n]*\n(\n|$))/xxx\1/g'
explanation
:a # set label a
$ !{ # if not end of file
N # Add a newline to the pattern space, then append the next line of input to the pattern space
b a # Unconditionally branch to label. The label may be omitted, in which case the next cycle is started.
}
# simply, above command :a;$!{N;ba} is used to read the whole file into pattern.
s/test([^\n]*\n(\n|$))/xxx\1/g # replace the key word if next line is empty (\n\n) or end of line ($)

sed: joining lines depending on the second one

I have a file that, occasionally, has split lines. The split is signaled by the fact that the line starts with '+' (possibly preceeded by spaces).
line 1
line 2
+ continue 2
line 3
...
I'd like join the split line back:
line 1
line 2 continue 2
line 3
...
using sed. I'm not clear how to join a line with the preceeding one.
Any suggestion?
This might work for you:
sed 'N;s/\n\s*+//;P;D' file
These are actually four commands:
N
Append line from the input file to the pattern space
s/\n\s*+//
Remove newline, following whitespace and the plus
P
print line from the pattern space until the first newline
D
delete line from the pattern space until the first newline, e.g. the part which was just printed
The relevant manual page parts are
Selecting lines by numbers
Addresses overview
Multiline techniques - using D,G,H,N,P to process multiple lines
Doing this in sed is certainly a good exercise, but it's pretty trivial in perl:
perl -0777 -pe 's/\n\s*\+//g' input
I'm not partial to sed so this was a nice challenge for me.
sed -n '1{h;n};/^ *+ */{s// /;H;n};{x;s/\n//g;p};${x;p}'
In awk this is approximately:
awk '
NR == 1 {hold = $0; next}
/^ *\+/ {$1 = ""; hold=hold $0; next}
{print hold; hold = $0}
END {if (hold) print hold}
'
If the last line is a "+" line, the sed version will print a trailing blank line. Couldn't figure out how to suppress it.
You can use Vim in Ex mode:
ex -sc g/+/-j -cx file
g global search
- select previous line
j join with next line
x save and close
Different use of hold space with POSIX sed... to load the entire file into the hold space before merging lines.
sed -n '1x;1!H;${g;s/\n\s*+//g;p}'
1x on the first line, swap the line into the empty hold space
1!H on non-first lines, append to the hold space
$ on the last line:
g get the hold space (the entire file)
s/\n\s*+//g replace newlines preceeding +
p print everything
Input:
line 1
line 2
+ continue 2
+ continue 2 even more
line 3
+ continued
becomes
line 1
line 2 continue 2 continue 2 even more
line 3 continued
This (or potong's answer) might be more interesting than a sed -z implementation if other commands were desired for other manipulations of the data you can simply stick them in before 1!H, while sed -z is immediately loading the entire file into the pattern space. That means you aren't manipulating single lines at any point. Same for perl -0777.
In other words, if you want to also eliminate comment lines starting with *, add in /^\s*\*/d to delete the line
sed -n '1x;/^\s*\*/d;1!H;${g;s/\n\s*+//g;p}'
versus:
sed -z 's/\n\s*+//g;s/\n\s*\*[^\n]*\n/\n/g'
The former's accumulation in the hold space line by line keeps you in classic sed line processing land, while the latter's sed -z dumps you into what could be some painful substring regexes.
But that's sort of an edge case, and you could always just pipe sed -z back into sed. So +1 for that.
Footnote for internet searches: This is SPICE netlist syntax.
A solution for versions of sed that can read NUL separated data, like here GNU Sed's -z:
sed -z 's/\n\s*+//g'
Compared to potong's solution this has the advantage of being able to join multiple lines that start with +. For example:
line 1
line 2
+ continue 2
+ continue 2 even more
line 3
becomes
line 1
line 2 continue 2 continue 2 even more
line 3