Replace string with file content - sed

$ cat input.txt
abc
$ sed -e '/PLACE_HOLDER/ {
s/PLACE_HOLDER//g
r input.txt
}' <<< '<div>PLACE_HOLDER</div>'
<div></div>
abc
I try to replace PLACE_HOLDER with the content of a file. But it pastes the file content after the matching line. How to just replace the match?
This is not a duplicate of
Use the contents of a file to replace a string using SED
none of the answers there answer my question specifically. For the 2nd one, which use a bash variable. It is not appropriate when the file is very large. For the first one does not have the problem in my example. In fact, my code is exactly the same as the first answer.

Like you discovered, the r command inserts new lines after the current line.
That's not suitable if you want to embed the contents of another file in the middle of other text on the same line which should not be replaced.
A crude fix is to build a sed script from your input file. Notice then that any & characters in the input file have to be escaped, as well as any literal newlines.
Because we will be escaping ampersands, I decided to use that as the separator for the s command, too.
sed 's/\&/\\&/g
1s/^/s\&PLACE_HOLDER\&/
$!s/$/\\/
$s/$/\&/' input.txt |
sed -f - targetfile
Unfortunately, because standard input is tied to -f - your script can't process standard input for replacements. A simple workaround for that is to save the generated sed script to a temporary file and pass that as the value for the -f option; this will also be necessary if your sed is one which does not accept the script on standard input.
I believe this should be reasonably portable, apart from the notes about -f - above.
Demo: https://ideone.com/oVgIni

Using any awk:
$ awk '
BEGIN { old="PLACE_HOLDER" }
NR==FNR { new=(NR>1 ? new ORS : "") $0; next }
s=index($0,old) { $0=substr($0,1,s-1) new substr($0,s+length(old)) }
{ print }
' input.txt - <<< '<div>PLACE_HOLDER</div>'
<div>abc</div>
The above will work no matter which characters are present in the string you want to match or the file you want to replace it with.

This might work for you (GNU sed):
sed -i 's/PLACE_HOLDER/$(cat input.txt)/g;s/.*/echo "&"/e' file
Substitute the evaluated expression cat input.txt for each match of PLACE_HOLDER globally throughout file.

Related

How to append data at a particular line in a file using sed , where data is from another file

Suppose I have a config file with some data , example file1.config , whose contents are:
flag_data_to_be_appended=xyz
and I have another file which is a shell script, example file2.sh , whose contents are:
./file.config
flag=abc
echo $flag
Now I need to append the information from file1 to file2 at flag , i.e output for flag has to look like :
flag=abc xyz
How can I do this with the help of "sed" command ?
Why not have sed write its own script?
sed -e "$(sed -e 's|^\(.*\)_data_to_be_appended=\(.*\)|/^\1=.*/ s//\& \2/|' cfg)" script
Inner command reads the config file and emits /^flag=.*/ s//& xyz/
which is then applied to the script file.
Output:
./file.config
flag=abc xyz
echo $flag
The two escaped parenthesis pairs capture key and value as \1 and \2.
In s//& \2/ the // is the null regex which matches the last
regex used (in /^…/) and replaces the entire match (&) followed
by the captured value.
This might work for you (GNU sed):
sed '/^flag=/s#.*#sed "s/.*=/& /" file1#e' file2
Match the line starting flag= in file2 and replace its contents with the singleton lines contents after the = sign by way of a second sed invocation being applied in the RHS of a substitution.

sed - Replace comma after first regex match

i m trying to perform the following substitution on lines of the general format:
BBBBBBB.2018_08,XXXXXXXXXXXXX,01/01/2014,"109,07",DF,CCCCCCCCCCC, .......
as you see the problem is that its a comma separated file, with a specific field containing a comma decimal. I would like to replace that with a dot .
I ve tried this, to replace the first occurence of a pattern after match, but to no avail, could someone help me?
sed -e '/,"/!b' -e "s/,/./"
sed -e '/"/!b' -e ':a' -e "s/,/\./"
Thanks in advance. An awk or perl solution would help me as well. Here's an awk effort:
gawk -F "," 'substr($10, 0, 3)==3 && length($10)==12 { gsub(/,/,".", $10); print}'
That yielded the same file unchanged.
CSV files should be parsed in awk with a proper FPAT variable that defines what constitutes a valid field in such a file. Once you do that, you can just iterate over the fields to do the substitution you need
gawk 'BEGIN { FPAT = "([^,]+)|(\"[^\"]+\")"; OFS="," }
{ for(i=1; i<=NF;i++) if ($i ~ /[,]/) gsub(/[,]/,".",$i);}1' file
See this answer of mine to understand how to define and parse CSV file content with FPAT variable. Also see Save modifications in place with awk to do in-place file modifications like sed -i''.
The following sed will convert all decimal separators in quoted numeric fields:
sed 's/"\([-+]\?[0-9]*\)[,]\?\([0-9]\+\([eE][-+]\?[0-9]+\)\?\)"/"\1.\2"/g'
See: https://www.regular-expressions.info/floatingpoint.html
This might work for you (GNU sed):
sed -E ':a;s/^([^"]*("[^",]*"[^"]*)*"[^",]*),/\1./;ta' file
This regexp matches a , within a pair of "'s and replaces it by a .. The regexp is anchored to the start of the line and thus needs to be repeated until no further matches can be matched, hence the :a and the ta commands which causes the substitution to be iterated over whilst any substitution is successful.
N.B. The solution expects that all double quotes are matched and that no double quotes are quoted i.e. \" does not appear in a line.
If your input always follows that format of only one quoted field containing 1 comma then all you need is:
$ sed 's/\([^"]*"[^"]*\),/\1./' file
BBBBBBB.2018_08,XXXXXXXXXXXXX,01/01/2014,"109.07",DF,CCCCCCCCCCC, .......
If it's more complicated than that then see What's the most robust way to efficiently parse CSV using awk?.
Assuming you have this:
BBBBBBB.2018_08,XXXXXXXXXXXXX,01/01/2014,"109,07",DF,CCCCCCCCCCC
Try this:
awk -F',' '{print $1,$2,$3,$4"."$5,$6,$7}' filename | awk '$1=$1' FS=" " OFS=","
Output will be:
BBBBBBB.2018_08,XXXXXXXXXXXXX,01/01/2014,"109.07",DF,CCCCCCCCCCC
You simply need to know the field numbers for replacing the field separator between them.
In order to use regexp as in perl you have to activate extended regular expression with -r.
So if you want to replace all numbers and omit the " sign, then you can use this:
echo 'BBBBBBB.2018_08,XXXXXXXXXXXXX,01/01/2014,"109,07",DF,CCCCCCCCCCC, .......'|sed -r 's/\"([0-9]+)\,([0-9]+)\"/\1\.\2/g'
If you want to replace first occurrence only you can use that:
echo 'BBBBBBB.2018_08,XXXXXXXXXXXXX,01/01/2014,"109,07",DF,CCCCCCCCCCC, .......'|sed -r 's/\"([0-9]+)\,([0-9]+)\"/\1\.\2/1'
https://www.gnu.org/software/sed/manual/sed.txt

Select specific items from a file using sed

I'm very much a junior when it comes to the sed command, and my Bruce Barnett guide sits right next to me, but one thing has been troubling me. With a file, can you filter it using sed to select only specific items? For example, in the following file:
alpha|november
bravo|october
charlie|papa
alpha|quebec
bravo|romeo
charlie|sahara
Would it be possible to set a command to return only the bravos, like:
bravo|october
bravo|romeo
With sed:
sed '/^bravo|/!d' filename
Alternatively, with grep (because it's sort of made for this stuff):
grep '^bravo|' filename
or with awk, which works nicely for tabular data,
awk -F '|' '$1 == "bravo"' filename
The first two use a regular expression, selecting those lines that match it. In ^bravo|, ^ matches the beginning of the line and bravo| the literal string bravo|, so this selects all lines that begin with bravo|.
The awk way splits the line across the field separator | and selects those lines whose first field is bravo.
You could also use a regex with awk:
awk '/^bravo|/' filename
...but I don't think this plays to awk's strengths in this case.
Another solution with sed:
sed -n '/^bravo|/p' filename
-n option => no printing by default.
If line begins with bravo|, print it (p)
2 way (at least) with sed
removing unwanted line
sed '/^bravo\|/ !d' YourFile
Printing only wanted lines
sed -n '/^bravo\|/ p' YourFile
if no other constraint or action occur, both are the same and a grep is better.
If there will be some action after, it could change the performance where a d cycle directly to the next line and a p will print then continue the following action.
Note the escape of pipe is needed for GNU sed, not on posix version

Using command line to remove text?

I have a huge file that contains lines that follow this format:
New-England-Center-For-Children-L0000392290
Southboro-Housing-Authority-L0000392464
Crew-Star-Inc-L0000391998
Saxony-Ii-Barber-Shop-L0000392491
Test-L0000392334
What I'm trying to do is narrow it down to just this:
New-England-Center-For-Children
Southboro-Housing-Authority
Crew-Star-Inc
Test
Can anyone help with this?
Using GNU awk:
awk -F\- 'NF--' OFS=\- file
New-England-Center-For-Children
Southboro-Housing-Authority
Crew-Star-Inc
Saxony-Ii-Barber-Shop
Test
Set the input and output field separator to -.
NF contains number of fields. Reduce it by 1 to remove the last field.
Using sed:
sed 's/\(.*\)-.*/\1/' file
New-England-Center-For-Children
Southboro-Housing-Authority
Crew-Star-Inc
Saxony-Ii-Barber-Shop
Test
Simple greedy regex to match up to the last hyphen.
In replacement use the captured group and discard the rest.
Version 1 of the Question
The first version of the input was in the form of HTML and parts had to be removed both before and after the desired text:
$ sed -r 's|.*[A-Z]/([a-zA-Z-]+)-L0.*|\1|' input
Special-Restaurant
Eliot-Cleaning
Kennedy-Plumbing
Version 2 of the Question
In the revised question, it is only necessary to remove the text that starts with -L00:
$ sed 's|-L00.*||' input2
New-England-Center-For-Children
Southboro-Housing-Authority
Crew-Star-Inc
Saxony-Ii-Barber-Shop
Test
Both of these commands use a single "substitute" command. The command has the form s|old|new|.
The perl code for this would be: perl -nle'print $1 if(m{-.*?/(.*?-.*?)-})
We can break the Regex down to matching the following:
- for that's between the city and state
.*? match the smallest set of character(s) that makes the Regex work, i.e. the State
/ matches the slash between the State and the data you want
( starts the capture of the data you are interested in
.*?-.*? will match the data you care about
) will close out the capture
- will match the dash before the L####### to give the regex something to match after your data. This will prevent the minimal Regex from matching 0 characters.
Then the print statement will print out what was captured (your data).
awk likes these things:
$ awk -F[/-] -v OFS="-" '{print $(NF-3), $(NF-2)}' file
Special-Restaurant
Eliot-Cleaning
Kennedy-Plumbing
This sets / and - as possible field separators. Based on them, it prints the last_field-3 and last_field-2 separated by the delimiter -. Note that $NF stands for last parameter, hence $(NF-1) is the penultimate, etc.
This sed is also helpful:
$ sed -r 's#.*/(\w*-\w*)-\w*\.\w*</loc>$#\1#' file
Special-Restaurant
Eliot-Cleaning
Kennedy-Plumbing
It selects the block word-word after a slash / and followed with word.word</loc> + end_of_line. Then, it prints back this block.
Update
Based on your new input, this can make it:
$ sed -r 's/(.*)-L\w*$/\1/' file
New-England-Center-For-Children
Southboro-Housing-Authority
Crew-Star-Inc
Saxony-Ii-Barber-Shop
Test
It selects everything up to the block -L + something + end of line, and prints it back.
You can use also another trick:
rev file | cut -d- -f2- | rev
As what you want is every slice of - separated fields, let's get all of them but last one. How? By reversing the line, getting all of them from the 2nd one and then reversing back.
Here's how I'd do it with Perl:
perl -nle 'm{example[.]com/bp/(.*?)/(.*?)-L\d+[.]htm} && print $2' filename
Note: the original question was matching input lines like this:
<loc>http://www.example.com/bp/Lowell-MA/Special-Restaurant-L0000423916.htm</loc>
<loc>http://www.example.com/bp/Houston-TX/Eliot-Cleaning-L0000422797.htm</loc>
<loc>http://www.example.com/bp/New-Orleans-LA/Kennedy-Plumbing-L0000423121.htm</loc>
The -n option tells Perl to loop over every line of the file (but not print them out).
The -l option adds a newline onto the end of every print
The -e 'perl-code' option executes perl-code for each line of input
The pattern:
/regex/ && print
Will only print if the regex matches. If the regex contains capture parentheses you can refer to the first captured section as $1, the second as $2 etc.
If your regex contains slashes, it may be cleaner to use a different regex delimiter ('m' stands for 'match'):
m{regex} && print
If you have a modern Perl, you can use -E to enable modern feature and use say instead of print to print with a newline appended:
perl -nE 'm{example[.]com/bp/(.*?)/(.*?)-L\d+[.]htm} && say $2' filename
This is very concise in Perl
perl -i.bak -lpe's/-[^-]+$//' myfile
Note that this will modify the input file in-place but will keep a backup of the original data in called myfile.bak

Remove from the beginning till certain part in a string

I work with strings like
abc_dsdsds_ss_gsgsdsfsdf_ewew_wewewewewew_adf
and I need to get a new one where I remove in the original string everything from the beginning till the last appearance of "_" and the next characters (can be 3, 4, or whatever number)
so in this case I would get
_adf
How could I do it with "sed" or another bash tool?
Regular expression pattern matching is greedy. Hence ^.*_ will match all characters up to and including the last _. Then just put the underscore back in:
echo abc_dsdsds_ss_gsgsdsfsdf_ewew_wewewewewew_adf | sed 's/^.*_/_/'
sed 's/^(.*)_([^_]*)$/_\2/' < input.txt
Do you need to modify the string, or just find everything after the last underscore? The regex to find the last _{anything} would be /(_[^_]+)$/ ($ matches the end of the string), or if you also want to match a trailing underscore with nothing after it, /(_[^_]*)$/.
Unless you really need to modify the string in place instead of just finding this piece, or you really want to do this from the command line instead of a script, this regex is a bit simpler (you tagged this with perl, so I wasn't sure quite how committed to using just the command line as opposed to a simple script you were).
If you do need to modify the string in place, sed -i 's/(_[^_]+)$/\1/' myfile or sed -i 's/(_[^_]+)$/\1/g' myfile. The -i (edit: I decided not to be lazy and look up the proper syntax...) the -i flag will just overwrite the old file with the new one. If you want to create a new file and not clobber the old one, sed -e 's/.../.../g' oldfile > newfile. The g after the s/// will do this for all instances in the file you pass into sed; leaving it out just replaces the first instance.
If the string is not by itself at the end of the line, but rather embedded in other text. but just separated by whitespace, replace the $ with \s, which will match a whitespace character (the end of a word).
If you have strings like these in bash variables (I don't see that specified in the question), you can use parameter expansion:
s="abc_dsdsds_ss_gsgsdsfsdf_ewew_wewewewewew_adf"
t="_${s##*_}"
echo "$t" # ==> _adf
In Perl, you could do this:
my $string = "abc_dsdsds_ss_gsgsdsfsdf_ewew_wewewewewew_adf";
if ( $string =~ m/(_[^_]+)$/ ) {
print $1;
}
[Edit]
A Perl one liner approach (ie, can be run from bash directly):
perl -lne 'm/(_[^_]+)$/ && print $1;' infile > outfile
Or using substitution:
perl -pe 's/.*(_[^_]+)$/$1/' infile > outfile
Just group the last non-underscore characters preceded by the last underscore with \(_[^_]*\), then reference this group with \1:
sed 's/^.*\(_[^_]*\)$/\1/'
Result:
$ echo abc_dsdsds_ss_gsgsdsfsdf_ewew_wewewewewew_adf | sed 's/^.*\(_[^_]*\)$/\1/'
_adf
A Perl way:
echo 'abc_dsdsds_ss_gsgsdsfsdf_ewew_wewewewewew_adf' | \
perl -e 'print ((split/(_)/,<>)[-2..-1])'
output:
_adf
Just for fun:
echo abc_dsdsds_ss_gsgsdsfsdf_ewew_wewewewewew_adf | tr _ '\n' | tail -n 1 | rev | tr '\n' _ | rev