sed wildcard and back reference - sed

I have a text log file that I need to add some basic html like stuff (to match decades of older logs).
It basically looks like this in all lines
<constant> <various> text
and should transform to this in each line
<font color="#7a40b0">constant:</font><font color="#0000fc">various:</font> text
constant always remains the same, various changes (these are usernames, could be anything, e.g. F0oBar 1)
I tried
sed 's|'\<constant\>\ \<'.*'\>'|'\<font\ color\=\"#7a40b0\"\>constant:\</font\>\<font\ color\=\"#0000fc\"\>\1:\</font\>'|g'
but this just returns
<font color="#7a40b0">constant:</font><font color="#0000fc">1:</font>
I tried searching on how to make the back reference work but I'm sure now I'm messing up the wildcard. As soon as I use parenthesis, brackets and/or ^ stuff breaks (the regex doesn't match anymore).
I've used https://sed.js.org to give me a visual help so I hope this gave me correct feedback.

Using sed
$ sed -E 's|<([^ ]*)> <([^ ]*)>|<font color="#7a40b0">\1:</font><font color="#0000fc">\2:</font>|' input_file
<font color="#7a40b0">constant:</font><font color="#0000fc">various:</font> text

Related

Replace words but only after a colon

I have been researching this for quite some time but cannot seem to find an answer. Perhaps someone here can help.
I am trying to use sed to replace words in yml / yaml files. Since some of the words are included in the names I want to only replace words that appear after the colon (':').
For example. If the .yml file includes:
en:
label_some_tracker: A tracker
label_all_tracker: All trackers
label_attachment_type_trackers: Select trackers.
tracker_plural: trackers
and I want to replace all occurrences of tracker with issue in all values. The pattern:
s/tracker/issue/
also changes the names of the fields, which breaks my code.
I can reduce the size of the problem somewhat by including terms for all possible variants of a word. For example:
s/trackers/issues/
s/tracker/issue/
but that doesn't deal with all situations.
I have tried inserting a space before the search term:
s/ tracker/ issue/
but that matches names where the search term is at the beginning of the line.
If I search for whole words then it still seems to pick up the names because ':' and '_' are 'non word' characters.
If I try to put spaces at the beginning and end of the search term but then it misses words that are at the end of a line or words patterns with punctuation marks before the training space.
The only sure way seems to be to only replace words after a colon (':') but I cannot seem to figure out how to do that with sed.
Does anyone here know how?
With GNU sed:
sed -E 's/(:.*)tracker/\1issue/g' file
Output:
en:
label_some_tracker: A issue
label_all_tracker: All issues
label_attachment_type_trackers: Select issues.
tracker_plural: issues
Replace second occurance:
sed 's/tracker/issue/2' file

Adding a comment character in most simple possible way

I want to search a file for a specific string and then place a comment at the beginning of that string. But I need an answer that avoids regex, global changes, and all the other fancy stuff.
I wrote this line:
sed -i.bak '/PermitRootLogin no/# PermitRootLogin no/' ./sshd_config
but I get an error:
sed: -e expression #1, char 21: comments don't accept any addresses
I assume the issue is that I need to escape the # character, but I'm not finding any resources on how to do that, or even mentioning it. I've tried various combinations of putting ^ or \ or \^ in front of the # but I'm jut not getting it right.
Please note I am intentionally repeating the text to be replaced. I would like the most simple possible solution to this question: how to replace "XYX" with "# XYZ" in the most obvious possible way.
As indicated in the comments by #mlt , you could try adding an s at the beginning your sed command. Straight from his comment:
s/PermitRootLogin....
I see that you said you're intentionally repeating the test to be replaced. If by that you mean, you want it to be the same, maybe consider grouping your matched text. I understand you may have meant that you just want it hand typed. Anyway, here is how to match the grouped text and add the comment character:
s/(PermitRootLogin)/# \1/
The parens indicated that the matched text should be consider a group, the \1 indicates that you want to put that matched group there.
I hope this was helpful. Happy coding! Leave a comment if you have any questions.

Perl - replace line in txt file with different matches

I've just started learning Perl this morning and my main aim was to replace lines of text. Suppose I have the following text file;
manufacturer=BMW
manufacturer=Honda
manufacturer=Mercedes
manufacturer=Toyota
manufacturer=Noble
manufacturer=Maserati
manufacturer=Jaguar
manufacturer=Ford
I want to replace all the lines so that the text file looks like this;
manufacturer=XXX
manufacturer=XXX
manufacturer=XXX
manufacturer=XXX
manufacturer=XXX
manufacturer=XXX
manufacturer=XXX
manufacturer=XXX
I've learnt how to replace a line of text which matches a particular case with my intended text as follows ;
s/BMW/XXX/ig
but considering in this file I have different cases, I don't want to keep updating the perl code with different manufacturers (Honda, mercedes, Toyota) everytime and then re-running the code. Surely, there must be a way in which I can simply search for lines beginning with (without worrying about the manufacturer)
manufacturer=*whatever*
and then replace the entire line with
manufacturer=XXX
can somebody please shed some light on how to go about doing this?
The way I would write this is
s/^manufacturer=\K.+/XXX/
The ^ at the beginning makes sure the manufacturer= starts at the beginning of the string, instead of just appearing in it anywhere.
The \K (for Keep) metacharacter means to ignore all of the preceding stuff in the substitution, so the pattern matches ^manufacturer=.+ but only .+ is replaced.
Lastly, the .+ matches everything up to the end of the string or a trailing newline (. doesn't match newlines).
You need to leverage the full power of regexes:
s/^manufacturer=\w+/manufacturer=XXX/ig
A job for regular expressions:
s/manufacturer=.*/manufacturer=XXX/g
Per your comment, this matches everything between = and EOL and replaces it with XXX.
Comprehensive docs here

Minify HTML files in text/html templates

I use mustache/handlebar templates.
eg:
<script id="contact-detail-template" type="text/html">
<div>... content to be compressed </div>
</script>
I am looking to compress/minify my HTML files in the templates for the best compression.
YUIcompressor, closure does not work as they think that it is script and gives me script errors.
HTMLCompressor does not touch them even as it thinks that it is a script.
How do I minify the content in the script tags with type text/html?
Can I use a library?
If not, is sed or egrep a preferable way? Do you have sed/egrep syntax to remove empty lines (with just spaces or tabs), remove all tabs, trim extra spaces?
Thanks.
sed -e "s/^[ \t]*//g" -e "/^$/d" yourfile This will remove all the extra spaces and tabs from the begining, and remove all empty lines.
sed -e "s/^[ \t]*//g" -e ":a;N;$!ba;s/\n//g" yourfile This will remove all the extra spaces and tabs from the begining, and concatenate all your code.
Sorry if i missed something.
Use sed ':a;N;$!ba;s/>\s*</></g' file, it enables to you remove whitespaces and newlines where unneeded. Unlike ghaschel example, this doesn't remove those useful whitespaces in the beginning of the line as it preserves <pre> and <p> tags.
This is useful as you can remove whitespaces between > and < which is a common method to enlarge a html file. This example could also be used for a XML file like atom feed and rss feed for example.
I personally use this as a pipe in my site generator, this can reduce a normaly file size and can be use in conjunction with gzip.
Try using Pretty Diff to minify this kind of code. It will only assume the stuff inside script tags is JavaScript if there is no mime type or if the type is one of the various JavaScript types. It is also intelligent enough to know which white space is okay to remove without corrupting the output of content or the recursive beautification of code later.

Escape pipe-character in org-mode

I've got a table in Emacs org-mode, and the contents are regular expressions. I can't seem to figure out how to escape a literal pipe-character (|) that's part of a regex though, so it's interpreted as a table-cell separator. Could someone point me to some help? Thanks.
Update: I'm also looking for escapes for a slash (/), so that it doesn't trigger the start of an italic/emphasis sequence. I experimented with \/ and \// - for example, suppose I want the literal text /foo/ in a table cell. Here are 3 ways of attempting it:
| /foo/ | \/foo/ | \//foo/ |
In LaTeX export, that becomes:
\emph{foo} & \/foo/ & \//foo/
So none of them is the plain /foo/ I'm hoping for.
\vert for the pipe.
Forward slashes seem to work fine for me unescaped when exporting both to HTML and PDF.
Use a broken-bar character, “¦”, Unicode 00A6 BROKEN BAR. This may or may not work for your specific needs, but it’s a good visual approximation.
You could also format the relevant text as verbatim or code:
Text in the code and verbatim string is not processed for Org mode
specific syntax; it is exported verbatim.
So you might try something like =foo | bar= (code) or foo ~|~ bar (verbatim). It does change the output format, though.