How to append data at a particular line in a file using sed , where data is from another file - sed

Suppose I have a config file with some data , example file1.config , whose contents are:
flag_data_to_be_appended=xyz
and I have another file which is a shell script, example file2.sh , whose contents are:
./file.config
flag=abc
echo $flag
Now I need to append the information from file1 to file2 at flag , i.e output for flag has to look like :
flag=abc xyz
How can I do this with the help of "sed" command ?

Why not have sed write its own script?
sed -e "$(sed -e 's|^\(.*\)_data_to_be_appended=\(.*\)|/^\1=.*/ s//\& \2/|' cfg)" script
Inner command reads the config file and emits /^flag=.*/ s//& xyz/
which is then applied to the script file.
Output:
./file.config
flag=abc xyz
echo $flag
The two escaped parenthesis pairs capture key and value as \1 and \2.
In s//& \2/ the // is the null regex which matches the last
regex used (in /^…/) and replaces the entire match (&) followed
by the captured value.

This might work for you (GNU sed):
sed '/^flag=/s#.*#sed "s/.*=/& /" file1#e' file2
Match the line starting flag= in file2 and replace its contents with the singleton lines contents after the = sign by way of a second sed invocation being applied in the RHS of a substitution.

Related

Replace string with file content

$ cat input.txt
abc
$ sed -e '/PLACE_HOLDER/ {
s/PLACE_HOLDER//g
r input.txt
}' <<< '<div>PLACE_HOLDER</div>'
<div></div>
abc
I try to replace PLACE_HOLDER with the content of a file. But it pastes the file content after the matching line. How to just replace the match?
This is not a duplicate of
Use the contents of a file to replace a string using SED
none of the answers there answer my question specifically. For the 2nd one, which use a bash variable. It is not appropriate when the file is very large. For the first one does not have the problem in my example. In fact, my code is exactly the same as the first answer.
Like you discovered, the r command inserts new lines after the current line.
That's not suitable if you want to embed the contents of another file in the middle of other text on the same line which should not be replaced.
A crude fix is to build a sed script from your input file. Notice then that any & characters in the input file have to be escaped, as well as any literal newlines.
Because we will be escaping ampersands, I decided to use that as the separator for the s command, too.
sed 's/\&/\\&/g
1s/^/s\&PLACE_HOLDER\&/
$!s/$/\\/
$s/$/\&/' input.txt |
sed -f - targetfile
Unfortunately, because standard input is tied to -f - your script can't process standard input for replacements. A simple workaround for that is to save the generated sed script to a temporary file and pass that as the value for the -f option; this will also be necessary if your sed is one which does not accept the script on standard input.
I believe this should be reasonably portable, apart from the notes about -f - above.
Demo: https://ideone.com/oVgIni
Using any awk:
$ awk '
BEGIN { old="PLACE_HOLDER" }
NR==FNR { new=(NR>1 ? new ORS : "") $0; next }
s=index($0,old) { $0=substr($0,1,s-1) new substr($0,s+length(old)) }
{ print }
' input.txt - <<< '<div>PLACE_HOLDER</div>'
<div>abc</div>
The above will work no matter which characters are present in the string you want to match or the file you want to replace it with.
This might work for you (GNU sed):
sed -i 's/PLACE_HOLDER/$(cat input.txt)/g;s/.*/echo "&"/e' file
Substitute the evaluated expression cat input.txt for each match of PLACE_HOLDER globally throughout file.

Input from one file and match it in other and print until a pattern match

I am having two files. File1 contain the following IDs:
id/35651
id/35325
id/20993
id/30167
id/29807
id/28315
id/29759
id/27715
id/26884
id/30412
File2 contains multiple IDs, similar pattern like File1, followed by multiline description. Now, I want to print all the IDs with description from File2 which are present in File1.
File2 is huge. I am having a smaller version here
>id/30412
GCACACATTTTCTCGCGCTCTCTCCGGCTCTCCTTTGTTTATTTTCTAATCTATATTTTTACTGGAAGAT
TTCCTCTTTATTCTCTCCCGCCCTCCTACAAGCGCTCTTGCTGGCCGTCTGGGTGCACACACCGCTCCCT
CGATCACCCCAGCCCCCTTCCTGGTCTCCCGAGCGCGGGGTTTGAAGGTCACCTCCTTTCCAGTCCCCGT
GCGAGCCGCGCTGCCGCCGCCTCCTCCAGCCAGAGTCGGTGGGACTGGCTGCGCTGCCCTGAAGTGGTTC
TCCAAGCAGCGCGGAGGGTGGCGGACGGCGGACGGAGCCCAGGGGCCGCGTCGGGTGGGGAAACCCGAAC
>id/28315
TCGCGGAGGGGAATCCCTCCCCCTCCGCCCCAGCCCCCCAGCAGCACCCGCGGTGGGGCGGGGGCGCTCT
GCCAGCCCCGGGAACAGCAGAGGCGGCGGCACTGGCTGGACCCACGCGCGCGCCTCCGGGGCTGAAGAAG
GAAGGAGTGAGCCGAGCCGAGCACCCCACATCTGGAGGGGACAGCCAGCCGTGGGCCCCGCCCCGGCGTC
CGGAGCAGGAGAACTCCGAGCTTCTTGCCCAGGCAGAGAGAGCAGGAGCGGACCGCGCGCCCGGGATTGA
>id/2313
GAGTCCTTGCGCTCCAGACCCCCACCCAGTGGCCGCCAGGGTCCCCGCCTGTCCGGACCCTCGCCGCGCC
CAGGCAGGCGCGCCAGGGCGGGGCTGACCTGCCCGCGAAGTTGCGGACAGTGCGTGAGAAACCAGCACCC
CCTTTATGGAAACTGGTCAAAGAACTCATGCAAGTGGAACTTACAGCTTCCTTGATCGGACTCAGCATTC
AGGGCCCAGTTTGCTCCCCCGCAGAACGGTATCCCCGCGGAATACACGGCCCCTCATCCCCACCCCGCGC
CAGAGTACACAGGCCAGACCACGGTTCCCGAGCACACATTAAACCTGTACCCTCCCGCCCAGACGCACTC
>id/26884
CGAGCAGAGCCCGGCGGACACGAGCGCTCAGACCGTCTCTGGCACCGCCACACAGACAGATGACGCAGCA
CCGACGGATGGCCAGCCCCAGACACAACCTTCTGAAAACACGGAAAACAAGTCTCAGCCCAAGCGGCTGC
ATGTCTCCAATATCCCCTTCAGGTTCCGGGATCCGGACCTCAGACAAATGTTTGGTCAATTTGGTAAAAT
CTTAGATGTTGAAATTATTTTTAATGAGCGAGGCTCAAAGGGATTTGGTTTCGTAACTTTCGAAAATAGT
>id/29807
GCCGATGCGGACAGGGCGAGGGAGAAATTACACGGCACCGTGGTAGAGGGCCGTAAAATCGAGGTAAATA
ATGCCACAGCACGTGTAATGACAAATAAAAAGACCGTCAACCCTTATACAAATGGCTGGAAATTGAATCC
AGTTGTGGGTGCAGTCTACAGTCCCGAATTCTATGCAGCACGGTCCTGTTGTGCCAGGCCAACCAGGAGG
GATCTTCCATGTACAGTGCCCCCAGTTCACTTGTATATACTTCTGCAATGCCAGGCTTCCCGTATCCAGC
AGCCACCGCCGCGGCCGCCTACCGAGGGGCGCACCTGCGAGGCCGCGGTCGCACCGTGTACAACACCTTC
>id/980
AGGGCCGCGGCGCCCCCGCCCCCGATCCCGGCCTACGGCGGTGTTGTTTACCAGGATGGATTTTATGGTG
CAGACATTTATGGTGGTTATGCTGCATACCGCTACGCCCAGCCTACCCCTGCCACTGCCGCTGCCTACAG
TGACAGTTACGGACGAGTTTATGCTGCCGACCCCTACCACCACGCACTTGCTCCAGCCCCCACCTACGGC
GTTGGTGCCATGAATGCTTTTGCACCTTTGACTGATGCCAAGACTAGGAGCCATGCTGATGATGTGGGTC
TCGTTCTTTCTTCATTGCAGGCTAGTATATACCGAGGGGGATACAACCGTTTTGCTCCATACTAAATGAC
AAAACCATAAAAACCTTCCAATGTGGGGAGAAAGGAAGCTTTCCGAGGCCTGAGTATTGCAATACATGCA
GTAGTACATCATTTTAGCAACTCT
I can do it one by one with the following command:
sed -n -e '/id\/30412/,/id/p' File2
But I am not sure how to tell sed to get the input from File1.
Also, is it possible not to print the matching pattern id\number in the last line?
This might work for you (GNU sed):
sed 's|id/\(.*\)|\\#^>id/\1$#{:\1;n;/^>/ba;b\1}|' file1 |
sed -e ':a' -f - -e 'd' file2
Build a sed script from file1 and run it against file2.
For each id build a loop which prints the current line then fetches the next line (n) and then checks if that line begins with <. If it does the script breaks to :a and checks for a new id, otherwise it prints the current lines and loops to a unique place holder based on the current id and continues printing.
Lines that do not match any id are deleted (d).

Extract filename from multiple lines in unix

I'm trying to extract the name of the file name that has been generated by a Java program. This Java program spits out multiple lines and I know exactly what the format of the file name is going to be. The information text that the Java program is spitting out is as follows:
ABCASJASLEKJASDFALDSF
Generated file YANNANI-0008876_17.xml.
TDSFALSFJLSDJF;
I'm capturing the output in a variable and then applying a sed operator in the following format:
sed -n 's/.*\(YANNANI.\([[:digit:]]\).\([xml]\)*\)/\1/p'
The result set is:
YANNANI-0008876_17.xml.
However, my problem is that want the extraction of the filename to stop at .xml. The last dot should never be extracted.
Is there a way to do this using sed?
Let's look at what your capture group actually captures:
$ grep 'YANNANI.\([[:digit:]]\).\([xml]\)*' infile
Generated file YANNANI-0008876_17.xml.
That's probably not what you intended:
\([[:digit:]]\) captures just a single digit (and the capture group around it doesn't do anything)
\([xml]\)* is "any of x, m or l, 0 or more times", so it matches the empty string (as above – or the line wouldn't match at all!), x, xx, lll, mxxxxxmmmmlxlxmxlmxlm, xml, ...
There is no way the final period is removed because you don't match anything after the capture groups
What would make sense instead:
Match "digits or underscores, 0 or more": [[:digit:]_]*
Match .xml, literally (escape the period): \.xml
Make sure the rest of the line (just the period, in this case) is matched by adding .* after the capture group
So the regex for the string you'd like to extract becomes
$ grep 'YANNANI.[[:digit:]_]*\.xml' infile
Generated file YANNANI-0008876_17.xml.
and to remove everything else on the line using sed, we surround regex with .*\( ... \).*:
$ sed -n 's/.*\(YANNANI.[[:digit:]_]*\.xml\).*/\1/p' infile
YANNANI-0008876_17.xml
This assumes you really meant . after YANNANI (any character).
You can call sed twice: first in printing and then in replacement mode:
sed -n 's/.*\(YANNANI.\([[:digit:]]\).\([xml]\)*\)/\1/p' | sed 's/\.$//g'
the last sed will remove all the last . at the end of all the lines fetched by your first sed
or you can go for a awk solution as you prefer:
awk '/.*YANNANI.[0-9]+.[0-9]+.xml/{print substr($NF,1,length($NF)-1)}'
this will print the last field (and truncate the last char of it using substr) of all the lines that do match your regex.

Using command line to remove text?

I have a huge file that contains lines that follow this format:
New-England-Center-For-Children-L0000392290
Southboro-Housing-Authority-L0000392464
Crew-Star-Inc-L0000391998
Saxony-Ii-Barber-Shop-L0000392491
Test-L0000392334
What I'm trying to do is narrow it down to just this:
New-England-Center-For-Children
Southboro-Housing-Authority
Crew-Star-Inc
Test
Can anyone help with this?
Using GNU awk:
awk -F\- 'NF--' OFS=\- file
New-England-Center-For-Children
Southboro-Housing-Authority
Crew-Star-Inc
Saxony-Ii-Barber-Shop
Test
Set the input and output field separator to -.
NF contains number of fields. Reduce it by 1 to remove the last field.
Using sed:
sed 's/\(.*\)-.*/\1/' file
New-England-Center-For-Children
Southboro-Housing-Authority
Crew-Star-Inc
Saxony-Ii-Barber-Shop
Test
Simple greedy regex to match up to the last hyphen.
In replacement use the captured group and discard the rest.
Version 1 of the Question
The first version of the input was in the form of HTML and parts had to be removed both before and after the desired text:
$ sed -r 's|.*[A-Z]/([a-zA-Z-]+)-L0.*|\1|' input
Special-Restaurant
Eliot-Cleaning
Kennedy-Plumbing
Version 2 of the Question
In the revised question, it is only necessary to remove the text that starts with -L00:
$ sed 's|-L00.*||' input2
New-England-Center-For-Children
Southboro-Housing-Authority
Crew-Star-Inc
Saxony-Ii-Barber-Shop
Test
Both of these commands use a single "substitute" command. The command has the form s|old|new|.
The perl code for this would be: perl -nle'print $1 if(m{-.*?/(.*?-.*?)-})
We can break the Regex down to matching the following:
- for that's between the city and state
.*? match the smallest set of character(s) that makes the Regex work, i.e. the State
/ matches the slash between the State and the data you want
( starts the capture of the data you are interested in
.*?-.*? will match the data you care about
) will close out the capture
- will match the dash before the L####### to give the regex something to match after your data. This will prevent the minimal Regex from matching 0 characters.
Then the print statement will print out what was captured (your data).
awk likes these things:
$ awk -F[/-] -v OFS="-" '{print $(NF-3), $(NF-2)}' file
Special-Restaurant
Eliot-Cleaning
Kennedy-Plumbing
This sets / and - as possible field separators. Based on them, it prints the last_field-3 and last_field-2 separated by the delimiter -. Note that $NF stands for last parameter, hence $(NF-1) is the penultimate, etc.
This sed is also helpful:
$ sed -r 's#.*/(\w*-\w*)-\w*\.\w*</loc>$#\1#' file
Special-Restaurant
Eliot-Cleaning
Kennedy-Plumbing
It selects the block word-word after a slash / and followed with word.word</loc> + end_of_line. Then, it prints back this block.
Update
Based on your new input, this can make it:
$ sed -r 's/(.*)-L\w*$/\1/' file
New-England-Center-For-Children
Southboro-Housing-Authority
Crew-Star-Inc
Saxony-Ii-Barber-Shop
Test
It selects everything up to the block -L + something + end of line, and prints it back.
You can use also another trick:
rev file | cut -d- -f2- | rev
As what you want is every slice of - separated fields, let's get all of them but last one. How? By reversing the line, getting all of them from the 2nd one and then reversing back.
Here's how I'd do it with Perl:
perl -nle 'm{example[.]com/bp/(.*?)/(.*?)-L\d+[.]htm} && print $2' filename
Note: the original question was matching input lines like this:
<loc>http://www.example.com/bp/Lowell-MA/Special-Restaurant-L0000423916.htm</loc>
<loc>http://www.example.com/bp/Houston-TX/Eliot-Cleaning-L0000422797.htm</loc>
<loc>http://www.example.com/bp/New-Orleans-LA/Kennedy-Plumbing-L0000423121.htm</loc>
The -n option tells Perl to loop over every line of the file (but not print them out).
The -l option adds a newline onto the end of every print
The -e 'perl-code' option executes perl-code for each line of input
The pattern:
/regex/ && print
Will only print if the regex matches. If the regex contains capture parentheses you can refer to the first captured section as $1, the second as $2 etc.
If your regex contains slashes, it may be cleaner to use a different regex delimiter ('m' stands for 'match'):
m{regex} && print
If you have a modern Perl, you can use -E to enable modern feature and use say instead of print to print with a newline appended:
perl -nE 'm{example[.]com/bp/(.*?)/(.*?)-L\d+[.]htm} && say $2' filename
This is very concise in Perl
perl -i.bak -lpe's/-[^-]+$//' myfile
Note that this will modify the input file in-place but will keep a backup of the original data in called myfile.bak

sed: replace pattern only if followed by empty line

I need to replace a pattern in a file, only if it is followed by an empty line. Suppose I have following file:
test
test
test
...
the following command would replace all occurrences of test with xxx
cat file | sed 's/test/xxx/g'
but I need to only replace test if next line is empty. I have tried matching a hex code, but that doesn ot work:
cat file | sed 's/test\x0a/xxx/g'
The desired output should look like this:
test
xxx
xxx
...
Suggested solutions for sed, perl and awk:
sed
sed -rn '1h;1!H;${g;s/test([^\n]*\n\n)/xxx\1/g;p;}' file
I got the idea from sed multiline search and replace. Basically slurp the entire file into sed's hold space and do global replacement on the whole chunk at once.
perl
$ perl -00 -pe 's/test(?=[^\n]*\n\n)$/xxx/m' file
-00 triggers paragraph mode which makes perl read chunks separated by one or several empty lines (just what OP is looking for). Positive look ahead (?=) to anchor substitution to the last line of the chunk.
Caveat: -00 will squash multiple empty lines into single empty lines.
awk
$ awk 'NR==1 {l=$0; next}
/^$/ {gsub(/test/,"xxx", l)}
{print l; l=$0}
END {print l}' file
Basically store previous line in l, substitute pattern in l if current line is empty. Print l. Finally print the very last line.
Output in all three cases
test
xxx
xxx
...
This might work for you (GNU sed):
sed -r '$!N;s/test(\n\s*)$/xxx\1/;P;D' file
Keep a window of 2 lines throughout the length of the file and if the second line is empty and the first line contains the pattern then make a substitution.
Using sed
sed -r ':a;$!{N;ba};s/test([^\n]*\n(\n|$))/xxx\1/g'
explanation
:a # set label a
$ !{ # if not end of file
N # Add a newline to the pattern space, then append the next line of input to the pattern space
b a # Unconditionally branch to label. The label may be omitted, in which case the next cycle is started.
}
# simply, above command :a;$!{N;ba} is used to read the whole file into pattern.
s/test([^\n]*\n(\n|$))/xxx\1/g # replace the key word if next line is empty (\n\n) or end of line ($)