sed - search in one file the contents of another - sed

So there are other similar questions, but here's in particular what I want to do -
I have one really long file. long.txt that looks like
line1
line2
line3
line4
line1
line1
line2
line8
line1
line2
now, I have another file, pattern.txt that looks like
line1
line2
Finally, replace.txt that looks like
newline1
newline2
Is there a way to call sed such that after running it on the above, I end up with
newline1
newline2
line3
line4
line1
newline1
newline2
line8
newline1
newline2

This might work for you (GNU sed):
cat <<\! >cat.sed
> :a;$!{N;ba};s/\n/\\n/
> !
sed ':a;$!'"{N;ba};s/$(sed -f cat.sed pattern.txt)/$(sed -f cat.sed replace.txt)/g" long.txt
newline1
newline2
line3
line4
line1
newline1
newline2
line8
newline1
newline2
Explanation:
Build the LHS (pattern) and RHS (replace) of a sed substitution using a generic sed script - cat.sed
Plug the above substitution into another sed script that processes the long.txt file.

$ paste -d'/' pattern.txt replace.txt | sed 's#.*#s/&/#' >script.sed
$ sed -f script.sed long.txt

Related

Using sed, extract text between first occurrence of a word1 and last occurrence of a word2

I need to extract text between the first occurrence of a word called "BEGIN" and the last occurrence of a word called "END" using sed.
Input:
line1
BEGIN
line2
line3
END
line4
line5
BEGIN
line6
line7
ENDED
END
line8
END
line9
line10
Expected Output:
BEGIN
line2
line3
END
line4
line5
BEGIN
line6
line7
ENDED
END
line8
END
My approach:
It extracts text between BEGIN and END. Here there are two BEGIN & END statement, and my solution extracts text between these words.
My solution fails to extract text between first occurence of word1 (BEGIN) and last occurence of word2 (END).
dsonachalam$ sed -n -e '/^BEGIN$/,/^END$/p' logs.txt
BEGIN
line2
line3
END
BEGIN
line6
line7
ENDED
END
start=$(grep -n "BEGIN" $FILE_NAME |cut -f1 -d:|head -n 1)
end=$(grep -n "END" $FILE_NAME |cut -f1 -d:|tail -n 1)
sed -n $start,"$end"p $FILE_NAME
If the file is small enough to fit memory:
$ perl -0777 -ne 'print /(^BEGIN\n.*^END\n)/ms' ip.txt
BEGIN
line2
line3
END
line4
line5
BEGIN
line6
line7
ENDED
END
line8
END
With a 2-pass approach to avoid having to store any text in memory so it'll work for any size input file and with 1 call to 1 standard UNIX tool to avoid spawning multiple subshells, the following will work using any awk in any shell on every UNIX box:
$ awk '
NR==FNR{ if (!beg && /BEGIN/) beg=NR; if (/END/) end=NR; next}
(beg <= FNR) && (FNR <= end)
' file file
BEGIN
line2
line3
END
line4
line5
BEGIN
line6
line7
ENDED
END
line8
END
A one-liner sed command would suffice (using GNU sed):
sed -E '/^BEGIN$/,$!d; :a; /(^|\n).*END$/{p;d}; $d; N; ba'
/^BEGIN$/,$!d; deletes lines above the first BEGIN. :a; /(^|\n).*END$/{p;d}; $d; N; ba accumulates ("slurps") lines into pattern space. Whenever an END line is read then the accumulated lines are printed out and pattern space is deleted starting a new cycle. Note that this "slurping" approach may be slow, or even may crash the sed process if the input is too large.
Content of input file:
line1
BEGIN
line2
line3
END
line4
line5
BEGIN
line6
line7
ENDED
END
line8
END
line9
line10
and using GNU sed 4.8
sed -E '/^BEGIN$/,$!d; :a; /(^|\n).*END$/{p;d}; $d; N; ba' inputfile
prints
BEGIN
line2
line3
END
line4
line5
BEGIN
line6
line7
ENDED
END
line8
END
Another approach would be:
lastend=$(sed -n '/^END$/=' inputfile | tail -1)
[[ -n $lastend ]] && sed -n "/^BEGIN\$/,${lastend}p" inputfile
This two-pass approach doesn't suffer from "slurping" lines.
This might work for you (GNU sed):
sed -n '/\<BEGIN\>/{x;:a;n;/\<END\>/{x;p;ba};H;$!ba;x;//P}' file
Set automatic printing off by using the -n option and then focus on lines following one that contains the word BEGIN.
Swap to the hold space (HS) and initiate a loop that fetches the next line and if that line contains the word END swap to the HS, print its contents and repeat.
If the current line does not contain the word END, append the current line to the HS and unless it is the end of file repeat.
At the end of file, print the first line of the HS if it begins END and whatever the condition allow the file processing to terminate.
Thus processing of lines only occurs once the word BEGIN has been seen and printing of those lines every time the word END occurs.

sed insert content of a file after a line specified by number

I know about r command in sed and that it can be used in combination with regular search pattern or with regexp. However I found out that regexp in sed works differently than in grep and I just want to use grep to get the line number after which to insert text. But I am failing to find how in sed I can specify a line number after which to insert a text of a external file ( command r ). Any ideas?
As related to expected output.
Input file 1 a.tmp:
Line1
Line2
Line3
Input file 2 b.tmp:
SubLine1
SubLine2
SubLine3
Suppose I want to insert b.tmp into a.tmp after line #2. I would expect to see this:
Line1
Line2
SubLine1
SubLine2
SubLine3
Line3
How would I do it?
Just use awk. Looks how simple and consistent (and also portable to all awks in all shells in every UNIX box and is efficient) it is to do whatever you want:
Insert a file after a line number:
$ awk 'NR==FNR{n=n s $0; s=ORS; next} {print} FNR==2{print n}' b.tmp a.tmp
Line1
Line2
SubLine1
SubLine2
SubLine3
Line3
Insert a file after a line containing a string matching a regexp:
$ awk 'NR==FNR{n=n s $0; s=ORS; next} {print} /Line2/{print n}' b.tmp a.tmp
Line1
Line2
SubLine1
SubLine2
SubLine3
Line3
Insert a file after a line that is a string (full line string match):
$ awk 'NR==FNR{n=n s $0; s=ORS; next} {print} $0=="Line2"{print n}' b.tmp a.tmp
Line1
Line2
SubLine1
SubLine2
SubLine3
Line3
Insert a file after a line containing a string (partial line substring match):
$ awk 'NR==FNR{n=n s $0; s=ORS; next} {print} index($0,"Line2"){print n}' b.tmp a.tmp
Line1
Line2
SubLine1
SubLine2
SubLine3
Line3
Insert a file before a line number:
$ awk 'NR==FNR{n=n s $0; s=ORS; next} FNR==2{print n} {print}' b.tmp a.tmp
Line1
SubLine1
SubLine2
SubLine3
Line2
Line3
Insert a file instead of a line number:
$ awk 'NR==FNR{n=n s $0; s=ORS; next} FNR==2{print n; next} {print}' b.tmp a.tmp
Line1
SubLine1
SubLine2
SubLine3
Line3
etc., etc. - any kind of matching you want to do and any action you want to take when that match succeeds is trivial, consistent, clear, portable, efficient and easy to modify/expand if/when your requirements change.
I will citate this page (actually really good tutorial about sed in all terms):
There is also a command for reading files. The command
sed '$r end' <in>out
will append the file "end" at the end of the file (address $)." The following will insert a file after the line with the word "INCLUDE"
sed '/INCLUDE/ r file' <in >out
You can use the curly braces to delete the line having the "INCLUDE" command on it:
sed '/INCLUDE/ {
r file
d
}'
The order of the delete command d and the read file command r is important. Change the order and it will not work. There are two subtle actions that prevent this from working. The first is the r command writes the file to the output stream. The file is not inserted into the pattern space, and therefore cannot be modified by any command. Therefore the delete command does not affect the data read from the file.
The other subtlety is the d command deletes the current data in the pattern space. Once all of the data is deleted, it does make sense that no other action will be attempted.

Sed combine only certain lines within directory

I am using sed to combine lines of text files in a directory.
The command cd dir && sed -e 'N;s/\n//' *.txt works fine to do that but is there any way it can be tweaked to only combine the line sentences that start with ** with following sentence ending in **. So
This is Line1
**This is Line2
This is Line3**
This is Line4
This is Line5
Becomes
This is Line1
** This is Line2 This is Line3**
This is Line4
This is Line5
etc
sed is for simple subsitutions on individual lines, that is all. For anything else you should be using awk. This will do what you show with your sample input/output:
$ awk '{ORS=(/^\*\*/?FS:RS)}1' file
This is Line1
**This is Line2 This is Line3**
This is Line4
This is Line5
but of course it doesn't address any of the requirements you haven't shared with us yet (e.g. what to do when a line starts with ** but the next line doesn't end with ** or vice-versa or a line starts and ends with ** or a line starting with ** is at the end of the input file or....).
Sed is your friend
$ sed '/^\*\*/{:l1;/\*\*$/!{N;bl1};s/\n/ /g;}' file
This is Line1
**This is Line2 This is Line3**
This is Line4
This is Line5
You can use this sed:
sed '/^\*\*/{:loop; N; /\*\*$/{s/\n/ /g;p;d;}; b loop}' file
Test:
$ cat file
This is Line1
**This is Line2
in between
This is Line3**
This is Line4
**This is Line5
This is Line6**
$ sed '/^\*\*/{:loop; N; /\*\*$/{s/\n/ /g;p;d;}; b loop;}' file
This is Line1
**This is Line2 in between This is Line3**
This is Line4
**This is Line5 This is Line6**
$ cat ip.txt
This is Line1
**This is Line2
This is Line3**
This is Line4
This is Line5
$ # this slurps entire file
$ perl -0777 -pe 's/^(\*\*.*)\n(.*\*\*)$/$1 $2/mg' ip.txt
This is Line1
**This is Line2 This is Line3**
This is Line4
This is Line5
$ # can use this if testing start of line for ** is enough
$ perl -pe 's/\n/ / if /^\Q**/' ip.txt
Reference: How do I search and replace across multiple lines with Perl?

How to delete the records that has '?' in a file using sed

How to delete the records that has '?' in the file ?
Input file data
12345 Line1
?
34567 Line2
?
89101 Line3
Expected Output
12345 Line1
34567 Line2
89101 Line3
sed '/?/d' yourfile
or
grep -v '?' yourfile
if you only wanted just a '?' and nothing else, do '^?$' instead of just the ?.
Seems like you want to replace a blankline followed by a ? symbol again followed blank line with a single blank line. If so then try the below perl command.
perl -0777 -pe 's/\n\?\n//g' file
Example:
$ perl -0777 -pe 's/\n\?\n//g' file
12345 Line1
34567 Line2
89101 Line3

merging matched lines with sed

I saw some answers here, but can't make them work for me.
I have text like this:
line1
line2 text=^M
line3
line4
basically what i need is to replace =^M\n with empty character something like s/=^M\n//, so the output is (^M is special character ctrl+v ctrl+m)
line1
line2 textline3
line4
I know it's some sed branches but I have problem with making them work.
One way:
$ sed '/^M/{N;s/=^M\n//;}' file
line1
line2 textline3
line4
Where ^M has to be typed as: Ctrl-V + Ctrl-M
awk solution for this
#awk -f myawk.sh temp.txt
BEGIN { print "Start Records"}
{
if ($2 ~ /=\^M/){
a=$1;
gsub("=\\^M","",$2);
b=$2; f=1
}
else {
if(f==1){
print a""b""$0;
a="";
b="";
}else{
print $0
}
}
}
END {print "Process Complete"}