How to search only first pattern range in sed - sed

My input file looks something like this
Start1
some text
that I want
modified
Pattern1
some other text
which I do not want
to modify
End1
Start1
Pattern2
End1
My sed pattern looks like this
/Start1/,/Pattern1/c\
Start1\
Modification text here\
Pattern1\
additional modifications
I only want the text within the first range of Start1 and End1 modified.
Additional, I am also specifying Pattern1 which does not exist in the second range.
I run
sed -i -f <sed_file> <input_file>
However, my output is given below. For some reason it wipes out the second range even though Pattern1 does not exist in it.
Start1
Modification text here
Pattern1
additional modifications
some other text
which I do not want
to modify
End1
Expected result
Start1
Modification text here
Pattern1
additional modifications
some other text
which I do not want
to modify
End1
Start1
Pattern2
End1

Try this one
sed ':A;/Start1/!b;N;/Pattern1/!bA;s/\(Start1\n\)\(.*\)\(\nPattern1\)/\1Modification text here\3\nadditional modifications/' infile

In GNU sed:
sed -e '/START/,/END/c TEXT
is not the same as
sed -e '/START/,/END/{c TEXT' -e '}'
The first will start omitting the range from the output stream and emit one instance of TEXT into the output string upon reaching the end of the range. The second will replace each line in the range with TEXT.
Your issue is that the second range is being omitted from the output stream even though you never reach the end of the second range. /START/,/END/c where /END/ never appears is basically like /START/,$d
The only solutions that I can figure are clunky:
/Start1/,/Pattern1/{
/Pattern1/{
# Insert into output stream
i\
Start1\
Modification text here\
Pattern1\
additional modifications
# Read in the rest of the file
:a
$!N
$!ba
# Remove the original Pattern1 line from the pattern space
# (Remove first line and newline of pattern space)
s/^[^\n]*\n//
# Print pattern space and quit
q
}
# Delete lines in the range other than /Pattern1/
d
}

Related

GREP Print Blank Lines For Non-Matches

I want to extract strings between two patterns with GREP, but when no match is found, I would like to print a blank line instead.
Input
This is very new
This is quite old
This is not so new
Desired Output
is very
is not so
I've attempted:
grep -o -P '(?<=This).*?(?=new)'
But this does not preserve the second blank line in the above example. Have searched for over an hour, tried a few things but nothing's worked out.
Will happily used a solution in SED if that's easier!
You can use
#!/bin/bash
s='This is very new
This is quite old
This is not so new'
sed -En 's/.*This(.*)new.*|.*/\1/p' <<< "$s"
See the online demo yielding
is very
is not so
Details:
E - enables POSIX ERE regex syntax
n - suppresses default line output
s/.*This(.*)new.*|.*/\1/ - finds any text, This, any text (captured into Group 1, \1, and then any text again, or the whole string (in sed, line), and replaces with Group 1 value.
p - prints the result of the substitution.
And this is what you need for your actual data:
sed -En 's/.*"user_ip":"([^"]*).*|.*/\1/p'
See this online demo. The [^"]* matches zero or more chars other than a " char.
With your shown samples, please try following awk code.
awk -F'This\\s+|\\s+new' 'NF==3{print $2;next} NF!=3{print ""}' Input_file
OR
awk -F'This\\s+|\\s+new' 'NF==3{print $2;next} {print ""}' Input_file
Explanation: Simple explanation would be, setting This\\s+ OR \\s+new as field separators for all the lines of Input_file. Then in main program checking condition if NF(number of fields) are 3 then print 2nd field (where next will take cursor to next line). In another condition checking if NF(number of fields) is NOT equal to 3 then simply print a blank line.
sed:
sed -E '
/This.*new/! s/.*//
s/.*This(.*)new.*/\1/
' file
first line: lines not matching "This.*new", remove all characters leaving a blank line
second lnie: lines matching the pattern, keep only the "middle" text
this is not the pcre non-greedy match: the line
This is new but that is not new
will produce the output
is new but that is not
To continue to use PCRE, use perl:
perl -lpe '$_ = /This(.*?)new/ ? $1 : ""' file
This might work for you:
sed -E 's/.*This(.*)new.*|.*/\1/' file
If the first match is made, the line is replace by everything between This and new.
Otherwise the second match will remove everything.
N.B. The substitution will always match one of the conditions. The solution was suggested by Wiktor Stribiżew.

Extract filename from multiple lines in unix

I'm trying to extract the name of the file name that has been generated by a Java program. This Java program spits out multiple lines and I know exactly what the format of the file name is going to be. The information text that the Java program is spitting out is as follows:
ABCASJASLEKJASDFALDSF
Generated file YANNANI-0008876_17.xml.
TDSFALSFJLSDJF;
I'm capturing the output in a variable and then applying a sed operator in the following format:
sed -n 's/.*\(YANNANI.\([[:digit:]]\).\([xml]\)*\)/\1/p'
The result set is:
YANNANI-0008876_17.xml.
However, my problem is that want the extraction of the filename to stop at .xml. The last dot should never be extracted.
Is there a way to do this using sed?
Let's look at what your capture group actually captures:
$ grep 'YANNANI.\([[:digit:]]\).\([xml]\)*' infile
Generated file YANNANI-0008876_17.xml.
That's probably not what you intended:
\([[:digit:]]\) captures just a single digit (and the capture group around it doesn't do anything)
\([xml]\)* is "any of x, m or l, 0 or more times", so it matches the empty string (as above – or the line wouldn't match at all!), x, xx, lll, mxxxxxmmmmlxlxmxlmxlm, xml, ...
There is no way the final period is removed because you don't match anything after the capture groups
What would make sense instead:
Match "digits or underscores, 0 or more": [[:digit:]_]*
Match .xml, literally (escape the period): \.xml
Make sure the rest of the line (just the period, in this case) is matched by adding .* after the capture group
So the regex for the string you'd like to extract becomes
$ grep 'YANNANI.[[:digit:]_]*\.xml' infile
Generated file YANNANI-0008876_17.xml.
and to remove everything else on the line using sed, we surround regex with .*\( ... \).*:
$ sed -n 's/.*\(YANNANI.[[:digit:]_]*\.xml\).*/\1/p' infile
YANNANI-0008876_17.xml
This assumes you really meant . after YANNANI (any character).
You can call sed twice: first in printing and then in replacement mode:
sed -n 's/.*\(YANNANI.\([[:digit:]]\).\([xml]\)*\)/\1/p' | sed 's/\.$//g'
the last sed will remove all the last . at the end of all the lines fetched by your first sed
or you can go for a awk solution as you prefer:
awk '/.*YANNANI.[0-9]+.[0-9]+.xml/{print substr($NF,1,length($NF)-1)}'
this will print the last field (and truncate the last char of it using substr) of all the lines that do match your regex.

Adding characters to the end of a line using SED

I am trying to add there exclamation points to the end of each line in this text file. I was told I need to switch all characters from lower case to upper case and have done so with string below. I am not sure how to incorporate the exclamation points in the same sed statement.
cat "trump.txt" | sed 's/./\U&/g'
Consider this sample text:
$ cat ip.txt
this is a sample text
Another line
3 apples
With sed command given in question which uppercases one character at a time with g flag
$ sed 's/./\U&/g' ip.txt
THIS IS A SAMPLE TEXT
ANOTHER LINE
3 APPLES
To add some other characters at end
$ sed 's/.*/\U&!!!/' ip.txt
THIS IS A SAMPLE TEXT!!!
ANOTHER LINE!!!
3 APPLES!!!
.* will match entire line and & will contain entire line while replacing
g flag is not needed as substitution is happening only once
Here is awk version , where all the text will be converted into uppercase and then three exclamations would be added.
awk '{$0=toupper($0) "!!!"}1' input
THIS IS A SAMPLE TEXT!!!
ANOTHER LINE!!!
3 APPLES!!!
Explanation:
$0 is entire line or record. toupper is an awk inbuilt function to convert input to uppercase. Here $0 is provided as input to toupper function. So, it will convert $0 to uppercase.finally uppercased $0 and !!! would be substituted to $0 as new values.
Breakdown of the command:
awk '{$0=toupper($0)}1' input # to make text uppercase.
awk '{$0= $0 "!!!"}1' input # to add "`!!!`" in the end of text.
Or bash way: ^^ sign after variable name will make contents of variable uppercase.
while read line;
do
echo "${line^^}"'!!!' ;
done <input

sed - process line twice or rewind

I'm editing data between two patterns, and I'm running into a problem where sed fails to match patterns that are back to back because the first pattern occurs in the same line as the second pattern.
The structure of my data looks something like this:
PATTERN2 Header PATTERN1
data
DATA_1 ...
DATA_2 ...
data
PATTERN2 Header PATTERN1
data
DATA_1 ...
DATA_2 ...
data
data
data
PATTERN2
...
and my sed command looks like this:
sed '/PATTERN1/,/PATTERN2/ {s/DATA_[12]/SUB/g}' myFile
The number of lines between the patterns is dynamic and there is no other reliable pattern to search on other than what is printed in the Header line. The Header line is the only indicator of the end of a block of data.
Is there an opposite of 'n' to "rewind" one line?
thanks!
This might work for you (GNU sed):
sed ':a;/PATTERN1/{:b;s/DATA_[12]/SUB/g;n;/PATTERN2/!bb;ba}' file
This uses a goto on encountering PATTERN2 to check for PATTERN1.

Delete code pattern using sed?

I want to use sed to delete part of code (paragraph) beginning with a pattern and ending with a semicolon (;).
Now I came across an example to delete a paragraph separated by new lines
sed -e '/./{H;$!d;}' -e 'x;/Pattern/!d'
I'm confused how to use semicolon not as a delimiter but as a pattern instead.
Thanks.
Other option is to use the GNU extension of address range.
Next example means: delete everything from a line which begins with pattern until a line ending with semicolon.
sed '/pattern/,/;$/ d' infile
EDIT to comment of Harsh:
Try next sed command:
sed '/^\s*LOG\s*(.*;\s*$/ d ; /^\s*LOG/,/;\s*$/ d' infile
Explanation:
/^\s*LOG\s*(.*;\s*$/ d # Delete line if begins with 'LOG' and ends with semicolon.
/^\s*LOG/,/;\s*$/ d # Delete range of lines between one that begins with LOG and
# other that ends with semicolon.
This might work for you:
cat <<! >file
> a
> b
> ;
> x
> y
> ;
> !
sed '/^[^;]*$/{H;$!d};x;s/;//;/x/!d' file
x
y
Explanation:
For any line the does not have a single ; in it /^[^;]*$/
Append the above line to the hold space (HS) and delete the pattern space (PS) and begin the next iteration unless it is the last line in the file. {H;$!d}
If a line is empty /^$/ or the last line of the file:
Swap to the HS x
Delete the first ; s/;//
Search for pattern (x) and if not found delete the PS /x/!d
N.B. This finds any pattern /x/ to find the beginning pattern use /^x/.
EDIT:
After having seen your data and expected result, this may work for you:
sed '/^\s*LOG(.*);/d;/^\s*LOG(/,/);/d' file