How to replace text using greedy approach in sed? - sed

I am parsing one file which has some html tag and changing into latex tag.
cat text
<Text>A <strong>ASDFF</strong> is a <em>cerebrovafdfasfscular</em> condifasdftion caufadfsed fasdfby tfdashe l
ocfsdafalised <span style="text-decoration: underline;">ballooning</span> or difdaslation of an arfdatery in thdfe bfdasrai
n. Smadfsall aasdneurysms may dadisplay fdasno ofadsbvious sdfasigns (<span style="text-decoration: underline;"><em><str
ong>asymptomatic</strong></em></span>) bfdasut lfdsaarger afdasneurysms maydas besda asfdsasociated widfth sdsfudd
sed -e 's|<strong>\(.*\)</strong>|\\textbf{\1}|g' test
cat out
<Text>A \textbf{ASDFF</strong> is a <em>cerebrovafdfasfscular</em> condifasdftion caufadfsed fasdfby tfdashe locfsda
falised <span style="text-decoration: underline;">ballooning</span> or difdaslation of an arfdatery in thdfe bfdasrain. Sma
dfsall aasdneurysms may dadisplay fdasno ofadsbvious sdfasigns (<span style="text-decoration: underline;"><em><strong&gt
;asymptomatic}</em></span>) bfdasut lfdsaarger afdasneurysms maydas besda asfdsasociated widfth sdsfudd
Expected outputs should be \textbf{ASDFF} while i observe \textbf{ASDFF .........}. How to get expected result?
Regards

You may want to use perl regex instead.
perl -pe 's|<strong>(.*?)</strong>|\\textbf{\1}|g'
Your problem is similar with non-greedy-regex-matching-in-sed. And next time you may want to simplify your case to point out the real problem. For example, don't just paste the raw html code, use this instead:
fooTEXT1barfooTEXT2bar
Update
If you just want the greedy approach, just ignore this.

Related

How to remove text between a string and a space using SED

I have a file with repeating line in it like this;
<stack-block name="B" sub-type="SBL" type="ABM_BLOCK" level="2" parent-name="PBTYRD" geo-anchor-latitude="-34.96723069348281" geo-anchor-longitude="150.2157080161554" geo-anchor-orientation="72.35290364141252" z-index-min="1" />
<stack-block name="C" sub-type="SBL" type="ABM_BLOCK" level="2" parent-name="PBTYRD" geo-anchor-latitude="-34.967529872288864" geo-anchor-longitude="150.2145108805486" geo-anchor-orientation="72.35290364141252" z-index-min="1" />
...and so on...
I want to remove the geo-anchor-latitude="-34.96723069348281" section from the lines of a file including the geo-anchor-latitude phrase up to the second double quote.
I have tried sed -i 's/geo-anchor-latitude.*"//' filename with no luck as it strips everything from geo-anchor-latitude to the end of the line.
Any clues out there? Thanks.
Would you try the following:
sed -i 's/geo-anchor-latitude="[^"]*"//' filename
Output:
<stack-block name="B" sub-type="SBL" type="ABM_BLOCK" level="2" parent-name="PBTYRD" geo-anchor-longitude="150.2157080161554" geo-anchor-orientation="72.35290364141252" z-index-min="1" />
<stack-block name="C" sub-type="SBL" type="ABM_BLOCK" level="2" parent-name="PBTYRD" geo-anchor-longitude="150.2145108805486" geo-anchor-orientation="72.35290364141252" z-index-min="1" />
The regex geo-anchor-latitude="[^"]*" matches the substring such as:
A literal string geo-anchor-latitude="
Followed by a sequence of any characters except for "
Followed by a double quote "
Then the matched substring above is removed by the s command.
You can use extended regular expressions (-E) with sed to do this.
sed -Ei 's/geo-anchor-latitude="[-0-9]+[.][0-9]+"//' filename
This regex looks for the latitude attribute, followed by a decimal number with any number of digits.

How to find a line in .css file and then changing the correlated config between the following brackets

Let me start off by saying I'm just starting to dabble in sed, awk and regex.
Here's what I need help with.
On ubuntu, in /etc/alternative/gdm3.css I have this config section:
.login-dialog-banner {
color: #d6d6d1; }
I need it to be
.login-dialog-banner{
color: rgba(255,255,255,1);
font-size: 14;
text-align: center;}
I am lost on how to first find .login-dialog-banner and then change the data in the follow on { data }
Would you try the following:
sed '
/\.login-dialog-banner[[:blank:]]*{/{ ;# if the specified 1st line is found
$!{ n ;# and the current line is not the last line, then print it and read the next line
s/.*color:.*/color: rgba(255,255,255,1);\
font-size: 14;\
text-align: center;}/ ;# if the next line contains "color:"
;# then replace the line with the specified lines
}
}' /etc/alternative/gdm3.css
Can you try below sed as per your requirement stated in the question:
sed '/^\.login-dialog-banner {/{N;s/color: #d6d6d1; }/color: rgba(255,255,255,1)\;\nfont-size: 14\;\ntext-align: center\;}/}' /etc/alternative/gdm3.css
I'm searching for the line starting with string .login-dialog-banner { and then substituting the second line : color: #d6d6d1; } with your next required data.
If the above command is working for you, you can include -i option for edit the file inplace.
sed -i '/^\.login-dialog-banner {/{N;s/color: #d6d6d1; }/color: rgba(255,255,255,1)\;\nfont-size: 14\;\ntext-align: center\;}/}' /etc/alternative/gdm3.css
From man sed:
i[SUFFIX], --in-place[=SUFFIX]
edit files in place (makes backup if SUFFIX supplied)

linux sed how to wrap a multiline search pattern with text

I have a html file which includes a section as follows:
<div id='webnews'>
... variable stuff ...
</div>
which I want to comment out as follows:
<!--
<div id='webnews'>
... variable stuff ...
</div>
-->
I can find & print the multiline text as follows:
sed '/<div id="webnews"/, /<\/div>/ { p }' filename.html
Experimenting with h, d, x and G, I have been unable work out how to either wrap the hold buffer or the pattern buffer with '<!--' and '-->'.
Would appreciate help with this challenge.
quick and dirty with sed (not the best idea on html unless you are sure of html content/structure)
sed "/<div id='webnews'/, /<\/div>/ {
/<div id='webnews'/ {
h
d
}
H
/<\/div>/ !d
x
s/^/<!--\\
/
s/$/\\
-->/
}" filename.html
This might work for you (GNU sed):
sed -e '/<div id='\''webnews'\''>/,/<\/dev>/!b;/<div id='\''webnews'\''>/i\<!--' -e '/<\/div>/a\-->' file
Or perhaps:
sed $'/<div id=.webnews.>/,/<\/dev>/{/<div id=.webnews.>/i\<!--\n;/<\/div>/a\-->\n}' file
Sed is not the right tool for the job.
Use sift:
sift -m '(.+)(<div id=.webnews.>.*</div>)(.+)' --replace '$1<!-- $2 -->$3'

sed/awk Capitallize everything between patterns and lowercase small words

I did find a way to capitalize the whole document, with both sed and awk, but how to do it, if I want to convert everything inside patterns from CAPS LOCK to Capital?
For example, I have an HTML file, and everything (multiple occurrences) between <b> and </b> has to be converted from TITLE to Title, and if possible making small words (1 ~ 2 letters) in lowercase.
From This:
<div id="1">
<div class="p"><b>THIS IS A RANDOM TITLE</b></div>
<table class="hugetable">
...
</table>
<div class="p"><b>THIS IS ANOTHER RANDOM TITLE</b></div>
<table class="hugetable">
...
</table>
...
</div>
To this:
<div id="1">
<div class="p"><b>This is a Random Title</b></div>
<table class="hugetable">
...
</table>
<div class="p"><b>This is Another Random Title</b></div>
<table class="hugetable">
...
</table>
...
</div>
This is not the most beautiful solution but I think it works:
sed -r -e '/<b>/ {s/( .)([^ ]*)/\1\L\2/g}' -e 's/<b>(.)/<b>\u\1/' -e '/<b>/ {s/(\b.{1,2}\b)/\L\1/g}' data
Explanation:
1st expression (-e): If a line contains <b>:
Then for each word which has a space in front of it, keep the space and the first (already capitalized) character (\1) and then convert all the following characters of the word to lower case (\L\2)
2nd expression (-e): The first word after <b> is still uncapitalized, so select the first character after the bold tag <b>(.) and replace it uppercased <b>\u\1
3rd expression (-e): Again if a line contains <b>:
Then select words of 1 or 2 characters in length \b.{1,2}\b and replace them lowercased \L\1

RegexKitLite How to convert a PHP regex Expression in objective c

I used this regex expression to search for img src in a string in one on my site.
Now I wan't to use this expression to do the same thing in objective c. How can I do that using RegexKitLite?
This is my expression
/<img.+src=[\'"]([^\'"]+)[\'"].*>/i
#Tim Pietzcker
Your code works great but for example if I try to search img in this string
<p> <img src="http://www.nationalgeographic.it/images/2011/07/29/115624013-20034abf-4d91-40fe-98ab-782f06a9854d.jpg" width="140" align="left" hspace="10">Scoperta in America del Sud la sepoltura pre-incaica di un uomo circondato da coltelli cerimoniali che secondo gli archeologi eseguiva sacrifici umani</p>
I have this result in my array:
matchArray: (
"<img src=\"http://www.nationalgeographic.it/images/2011/07/29/115624013-20034abf-4d91-40fe-98ab-782f06a9854d.jpg\" width=\"140\" align=\"left\" hspace=\"10\">"
)
How can I mod your regex to only get the content of src tag? thank you so much
The / delimiters are throwing you off. Also, you should at least use lazy quantifiers. Try this:
NSString *regexString = #"(?i)<img.+?src=['\"]([^'\"]+)['\"].*?>";
This breaks when filenames contain quotes, by the way. Could that be a problem for you?
A regex that's a bit safer (and that handles quotes well) would be
NSString *regexString = #"(?i)<img[^<>]+?src=(['\"])((?:(?!\\1).)+)\\1[^<>]*>";
However, now the matches filename will be in capture group 2, not 1, so you need to modify any code that uses the filename after the match.