sed replace in file until pattern match - sed

I use sed to do a simple replacement to headers in a file.
Sometimes they need to be replaced, sometimes not.
It works fine, but is long because it reads the the files every time (hundreds of MB).
However there is a pattern that separates the header from the content.
How do I tell sed to stop processing the file after encountering a certain pattern ?
Example :
blabla headers that I want to edit here but maybe not FRAME some more content here
Let's say that want to remove "want" from the headers, but the word may or may not be in said headers. I know that I want to stop processing the file at FRAME.
sed -i '0,/\(pattern1\|pattern2\)/s//pattern1/' * ; # TODO stop at FRAME

You can use the q command to quit the sed processing the rest of the input
sed -i '0,/\(pattern1\|pattern2\)/s//pattern1/' * ; /FRAME/q'
/FRAME/ pattern matches the line containing FRAME upon which the command q is excecuted
OR
You can specify an address range from start of the file till it encounters FRAME as
sed '0, /FRAME/ s/old/new'

You can use awk
awk '/pattern stop/ {f=1} !f {sub(/old data/,"new data")} 1' file
This will replace old data with "new data" as long as pattern stop is not found.
To write data back to original file:
awk 'code' file > tmp && mv tmp file

Related

move everything after the 6th backslash up one line with sed

http://www.somesite/play/episodes/xyz/fred-episode-110
http://www.somesite/play/episodes/abc/simon-episode-266
http://www.somesite/play/episodes/qwe/mum-episode-39
http://www.somesite/play/episodes/zxc/dad-episode-41
http://www.somesite/play/episodes/asd/bob-episode-57
i have many url's saved in a txt file like show above i want to move everything after the 6th backslash up one line with a sed script
the txt after the 6th backslash is the title and always different i need to select the title so i can play it
so i need it to look like this
fred-episode-110
http://www.somesite/play/episodes/xyz/fred-episode-110
simon-episode-266
http://www.somesite/play/episodes/abc/simon-episode-266
mum-episode-39
http://www.somesite/play/episodes/qwe/mum-episode-39
dad-episode-41
http://www.somesite/play/episodes/zxc/dad-episode-41
bob-episode-57
http://www.somesite/play/episodes/asd/bob-episode-57
using just sed
i can do this with awk but i want to do this with just sed
You can use the following sed command:
sed 'h;s#\([^/]*/\)\{6\}##;p;x;' sed_test.txt
On your input:
Explanations:
h; copy your pattern buffer to your hold buffer
s#\([^/]*/\)\{6\}##; delete until the 6th / the content of your pattern buffer
p; print the pattern buffer
x exchange the pattern buffer and hold buffer content
then do the default action -> print the content of the pattern buffer
You can use this one too
sed -E 's|(.*/)(.*)|\2\n&|' infile

Extract filename from multiple lines in unix

I'm trying to extract the name of the file name that has been generated by a Java program. This Java program spits out multiple lines and I know exactly what the format of the file name is going to be. The information text that the Java program is spitting out is as follows:
ABCASJASLEKJASDFALDSF
Generated file YANNANI-0008876_17.xml.
TDSFALSFJLSDJF;
I'm capturing the output in a variable and then applying a sed operator in the following format:
sed -n 's/.*\(YANNANI.\([[:digit:]]\).\([xml]\)*\)/\1/p'
The result set is:
YANNANI-0008876_17.xml.
However, my problem is that want the extraction of the filename to stop at .xml. The last dot should never be extracted.
Is there a way to do this using sed?
Let's look at what your capture group actually captures:
$ grep 'YANNANI.\([[:digit:]]\).\([xml]\)*' infile
Generated file YANNANI-0008876_17.xml.
That's probably not what you intended:
\([[:digit:]]\) captures just a single digit (and the capture group around it doesn't do anything)
\([xml]\)* is "any of x, m or l, 0 or more times", so it matches the empty string (as above – or the line wouldn't match at all!), x, xx, lll, mxxxxxmmmmlxlxmxlmxlm, xml, ...
There is no way the final period is removed because you don't match anything after the capture groups
What would make sense instead:
Match "digits or underscores, 0 or more": [[:digit:]_]*
Match .xml, literally (escape the period): \.xml
Make sure the rest of the line (just the period, in this case) is matched by adding .* after the capture group
So the regex for the string you'd like to extract becomes
$ grep 'YANNANI.[[:digit:]_]*\.xml' infile
Generated file YANNANI-0008876_17.xml.
and to remove everything else on the line using sed, we surround regex with .*\( ... \).*:
$ sed -n 's/.*\(YANNANI.[[:digit:]_]*\.xml\).*/\1/p' infile
YANNANI-0008876_17.xml
This assumes you really meant . after YANNANI (any character).
You can call sed twice: first in printing and then in replacement mode:
sed -n 's/.*\(YANNANI.\([[:digit:]]\).\([xml]\)*\)/\1/p' | sed 's/\.$//g'
the last sed will remove all the last . at the end of all the lines fetched by your first sed
or you can go for a awk solution as you prefer:
awk '/.*YANNANI.[0-9]+.[0-9]+.xml/{print substr($NF,1,length($NF)-1)}'
this will print the last field (and truncate the last char of it using substr) of all the lines that do match your regex.

Insert specific lines from file before first occurrence of pattern using Sed

I want to insert a range of lines from a file, say something like 210,221r before the first occurrence of a pattern in a bunch of other files.
As I am clearly not a GNU sed expert, I cannot figure how to do this.
I tried
sed '0,/pattern/{210,221r file
}' bunch_of_files
But apparently file is read from line 210 to EOF.
Try this:
sed -r 's/(FIND_ME)/PUT_BEFORE\1/' test.text
-r enables extendend regular expressions
the string you are looking for ("FIND_ME") is inside parentheses, which creates a capture group
\1 puts the captured text into the replacement.
About your second question: You can read the replacement from a file like this*:
sed -r 's/(FIND_ME)/`cat REPLACEMENT.TXT`\1/' test.text
If replace special characters inside REPLACEMENT.TXT beforehand with sed you are golden.
*= this depends on your terminal emulator. It works in bash.
In https://stackoverflow.com/a/11246712/4328188 CodeGnome gave some "sed black magic" :
In order to insert text before a pattern, you need to swap the pattern space into the hold space before reading in the file. For example:
sed '/pattern/ {
h
r file
g
N
}' in
However, to read specific lines from file, one may have to use a two-calls solution similar to dummy's answer. I'd enjoy knowing of a one-call solution if it is possible though.

keep the first part and delete the rest on a specified line using sed

I know a line number in a file, wherein I want to keep the first word and delete the rest till the end of the line. How do I do this using sed ?
So lets say, I want to go to line no 10 in a file, which looks like this -
goodword "blah blah"\
and what i want is
goodword
I have tried this - sed 's/([a-z])./\1/'
But this does it on all the lines in a file. I want it only on one specified line.
If by "first word" you mean "everything up to the first space", and if by "retain this change in the file itself" you mean that you don't mind creating a new file with the same name as the previous file, and if you have a sed that supports -i, you can probably just do:
sed -i '10s/ .*//' input-file
If you want to be more restrictive in the definition of a word, you can use '10s/\([a-z]*\).*/\1/'
Can you use grep or awk to grab just one line, and then pipe it into sed (if grep or awk couldn't do the entire job for you) to work on just one line? I think the key here is isolating that one line first, and then worrying about extracting something from it.
Using awk
awk 'NR==10 {print $1}' file
goodword

How to insert the content of a file two lines after the line where a pattern is found?

I have a file like as below and I want to search for the pattern "Unix" and insert the content of another file two lines after the line where the pattern is matched. I want to do it in sed.
$ cat text1
Unix
Windows
Database
Wintel
Sql
Java
$
Output should be
Unix
Windows
Database
CONTENT OF ANOTHER FILE
Wintel
Sql
Java
It looks a bit funny, but this works with both GNU sed and BSD sed (on Mac OS X), and should work with most versions of sed:
sed -e '/Unix/{N;N;p;r content' -e 'd;}' data
Or:
sed -e '/Unix/{
N
N
p
r content
d
}' data
The N commands add extra lines to the pattern space (so the pattern space holds three lines containing Unix, Windows and Database); the p command prints the pattern space; the r content reads the file content and adds it to the output; the d deletes the pattern space; the {} group these operations so that they only occur when the input line matches Unix.
The r content must be at the end of a line of the script, or at the end of a -e argument, as shown. Trying to add a semicolon after it does not work (after all, the file name might contain a semicolon).
This might work for you (GNU sed):
sed '/Unix/!b;n;n;r another_file' text1
If the line doesn't contain unix bail out. Otherwise print it and get the next line, repeat and then read in the second file.
N.B. The second line following unix is printed first as this is now part of the current cycle, another_file is inserted into the pattern space following the end of the current cycle.