sed - process line twice or rewind - sed

I'm editing data between two patterns, and I'm running into a problem where sed fails to match patterns that are back to back because the first pattern occurs in the same line as the second pattern.
The structure of my data looks something like this:
PATTERN2 Header PATTERN1
data
DATA_1 ...
DATA_2 ...
data
PATTERN2 Header PATTERN1
data
DATA_1 ...
DATA_2 ...
data
data
data
PATTERN2
...
and my sed command looks like this:
sed '/PATTERN1/,/PATTERN2/ {s/DATA_[12]/SUB/g}' myFile
The number of lines between the patterns is dynamic and there is no other reliable pattern to search on other than what is printed in the Header line. The Header line is the only indicator of the end of a block of data.
Is there an opposite of 'n' to "rewind" one line?
thanks!

This might work for you (GNU sed):
sed ':a;/PATTERN1/{:b;s/DATA_[12]/SUB/g;n;/PATTERN2/!bb;ba}' file
This uses a goto on encountering PATTERN2 to check for PATTERN1.

Related

move everything after the 6th backslash up one line with sed

http://www.somesite/play/episodes/xyz/fred-episode-110
http://www.somesite/play/episodes/abc/simon-episode-266
http://www.somesite/play/episodes/qwe/mum-episode-39
http://www.somesite/play/episodes/zxc/dad-episode-41
http://www.somesite/play/episodes/asd/bob-episode-57
i have many url's saved in a txt file like show above i want to move everything after the 6th backslash up one line with a sed script
the txt after the 6th backslash is the title and always different i need to select the title so i can play it
so i need it to look like this
fred-episode-110
http://www.somesite/play/episodes/xyz/fred-episode-110
simon-episode-266
http://www.somesite/play/episodes/abc/simon-episode-266
mum-episode-39
http://www.somesite/play/episodes/qwe/mum-episode-39
dad-episode-41
http://www.somesite/play/episodes/zxc/dad-episode-41
bob-episode-57
http://www.somesite/play/episodes/asd/bob-episode-57
using just sed
i can do this with awk but i want to do this with just sed
You can use the following sed command:
sed 'h;s#\([^/]*/\)\{6\}##;p;x;' sed_test.txt
On your input:
Explanations:
h; copy your pattern buffer to your hold buffer
s#\([^/]*/\)\{6\}##; delete until the 6th / the content of your pattern buffer
p; print the pattern buffer
x exchange the pattern buffer and hold buffer content
then do the default action -> print the content of the pattern buffer
You can use this one too
sed -E 's|(.*/)(.*)|\2\n&|' infile

Extract filename from multiple lines in unix

I'm trying to extract the name of the file name that has been generated by a Java program. This Java program spits out multiple lines and I know exactly what the format of the file name is going to be. The information text that the Java program is spitting out is as follows:
ABCASJASLEKJASDFALDSF
Generated file YANNANI-0008876_17.xml.
TDSFALSFJLSDJF;
I'm capturing the output in a variable and then applying a sed operator in the following format:
sed -n 's/.*\(YANNANI.\([[:digit:]]\).\([xml]\)*\)/\1/p'
The result set is:
YANNANI-0008876_17.xml.
However, my problem is that want the extraction of the filename to stop at .xml. The last dot should never be extracted.
Is there a way to do this using sed?
Let's look at what your capture group actually captures:
$ grep 'YANNANI.\([[:digit:]]\).\([xml]\)*' infile
Generated file YANNANI-0008876_17.xml.
That's probably not what you intended:
\([[:digit:]]\) captures just a single digit (and the capture group around it doesn't do anything)
\([xml]\)* is "any of x, m or l, 0 or more times", so it matches the empty string (as above – or the line wouldn't match at all!), x, xx, lll, mxxxxxmmmmlxlxmxlmxlm, xml, ...
There is no way the final period is removed because you don't match anything after the capture groups
What would make sense instead:
Match "digits or underscores, 0 or more": [[:digit:]_]*
Match .xml, literally (escape the period): \.xml
Make sure the rest of the line (just the period, in this case) is matched by adding .* after the capture group
So the regex for the string you'd like to extract becomes
$ grep 'YANNANI.[[:digit:]_]*\.xml' infile
Generated file YANNANI-0008876_17.xml.
and to remove everything else on the line using sed, we surround regex with .*\( ... \).*:
$ sed -n 's/.*\(YANNANI.[[:digit:]_]*\.xml\).*/\1/p' infile
YANNANI-0008876_17.xml
This assumes you really meant . after YANNANI (any character).
You can call sed twice: first in printing and then in replacement mode:
sed -n 's/.*\(YANNANI.\([[:digit:]]\).\([xml]\)*\)/\1/p' | sed 's/\.$//g'
the last sed will remove all the last . at the end of all the lines fetched by your first sed
or you can go for a awk solution as you prefer:
awk '/.*YANNANI.[0-9]+.[0-9]+.xml/{print substr($NF,1,length($NF)-1)}'
this will print the last field (and truncate the last char of it using substr) of all the lines that do match your regex.

How to search only first pattern range in sed

My input file looks something like this
Start1
some text
that I want
modified
Pattern1
some other text
which I do not want
to modify
End1
Start1
Pattern2
End1
My sed pattern looks like this
/Start1/,/Pattern1/c\
Start1\
Modification text here\
Pattern1\
additional modifications
I only want the text within the first range of Start1 and End1 modified.
Additional, I am also specifying Pattern1 which does not exist in the second range.
I run
sed -i -f <sed_file> <input_file>
However, my output is given below. For some reason it wipes out the second range even though Pattern1 does not exist in it.
Start1
Modification text here
Pattern1
additional modifications
some other text
which I do not want
to modify
End1
Expected result
Start1
Modification text here
Pattern1
additional modifications
some other text
which I do not want
to modify
End1
Start1
Pattern2
End1
Try this one
sed ':A;/Start1/!b;N;/Pattern1/!bA;s/\(Start1\n\)\(.*\)\(\nPattern1\)/\1Modification text here\3\nadditional modifications/' infile
In GNU sed:
sed -e '/START/,/END/c TEXT
is not the same as
sed -e '/START/,/END/{c TEXT' -e '}'
The first will start omitting the range from the output stream and emit one instance of TEXT into the output string upon reaching the end of the range. The second will replace each line in the range with TEXT.
Your issue is that the second range is being omitted from the output stream even though you never reach the end of the second range. /START/,/END/c where /END/ never appears is basically like /START/,$d
The only solutions that I can figure are clunky:
/Start1/,/Pattern1/{
/Pattern1/{
# Insert into output stream
i\
Start1\
Modification text here\
Pattern1\
additional modifications
# Read in the rest of the file
:a
$!N
$!ba
# Remove the original Pattern1 line from the pattern space
# (Remove first line and newline of pattern space)
s/^[^\n]*\n//
# Print pattern space and quit
q
}
# Delete lines in the range other than /Pattern1/
d
}

Can't replace '\n' with '\\' for whatever reason

I have a whole bunch of files, and I wish to change something like this:
My line of text
My other line of text
Into
My line of text\\
My other line of text
Seems simple, but somehow it isn't. I have tried sed s,"\n\n","\\\\\n", as well as tr '\n' '\\' and about 20 other incarnations of these commands.
There must be something going on which I don't understand... but I'm completely lost as to why nothing is working. I've had some comical things happening too, like when cat'ing out the file, it doesn't print newlines, only writes over where the rest was written.
Does anyone know how to accomplish this?
sed works on lines. It fetches a line, applies your code to it, fetches the next line, and so forth. Since lines are treated individually, multiline regexes don't work quite so easily.
In order to use multiline regexes with sed, you have to first assemble the file in the pattern space and then work on it:
sed ':a $!{ N; ba }; s/\n\n/\\\\\n/g' filename
The trick here is the
:a $!{ N; ba }
This works as follows:
:a # jump label for looping
$!{ # if the end of the input has not been reached
N # fetch the next line and append it to what we already have
ba # go to :a
}
Once this is over, the whole file is in the pattern space, and multiline regexes can be applied to it. Of course, this requires that the file is small enough to fit into memory.
sed is line-oriented and so is inappropriate to try to use on problems that span lines. You just need to use a record-oriented tool like awk:
$ awk -v RS='^$' -v ORS= '{gsub(/\n\n/,"\\\\\n")}1' file
My line of text\\
My other line of text
The above uses GNU awk for multi-char RS.
Here is an awk that solve this:
If the the blank lines could contains tabs or spaces, user this:
awk '!NF{a=a"//"} b{print a} {a=$0;b=NF} END {print a}' file
My line of text//
My other line of text
If blank line is just blank with nothing, this should do:
awk '!NF{a=a"//"} a!=""{print a} {a=$0} END {print a}' file
This might work for you (GNU sed):
sed 'N;s|\n$|//|;P;D' file
This keeps 2 lines in the pattern space at any point in time and replaces an empty line by a double slash.

How to insert the content of a file two lines after the line where a pattern is found?

I have a file like as below and I want to search for the pattern "Unix" and insert the content of another file two lines after the line where the pattern is matched. I want to do it in sed.
$ cat text1
Unix
Windows
Database
Wintel
Sql
Java
$
Output should be
Unix
Windows
Database
CONTENT OF ANOTHER FILE
Wintel
Sql
Java
It looks a bit funny, but this works with both GNU sed and BSD sed (on Mac OS X), and should work with most versions of sed:
sed -e '/Unix/{N;N;p;r content' -e 'd;}' data
Or:
sed -e '/Unix/{
N
N
p
r content
d
}' data
The N commands add extra lines to the pattern space (so the pattern space holds three lines containing Unix, Windows and Database); the p command prints the pattern space; the r content reads the file content and adds it to the output; the d deletes the pattern space; the {} group these operations so that they only occur when the input line matches Unix.
The r content must be at the end of a line of the script, or at the end of a -e argument, as shown. Trying to add a semicolon after it does not work (after all, the file name might contain a semicolon).
This might work for you (GNU sed):
sed '/Unix/!b;n;n;r another_file' text1
If the line doesn't contain unix bail out. Otherwise print it and get the next line, repeat and then read in the second file.
N.B. The second line following unix is printed first as this is now part of the current cycle, another_file is inserted into the pattern space following the end of the current cycle.