Perl - replace line in txt file with different matches - perl

I've just started learning Perl this morning and my main aim was to replace lines of text. Suppose I have the following text file;
manufacturer=BMW
manufacturer=Honda
manufacturer=Mercedes
manufacturer=Toyota
manufacturer=Noble
manufacturer=Maserati
manufacturer=Jaguar
manufacturer=Ford
I want to replace all the lines so that the text file looks like this;
manufacturer=XXX
manufacturer=XXX
manufacturer=XXX
manufacturer=XXX
manufacturer=XXX
manufacturer=XXX
manufacturer=XXX
manufacturer=XXX
I've learnt how to replace a line of text which matches a particular case with my intended text as follows ;
s/BMW/XXX/ig
but considering in this file I have different cases, I don't want to keep updating the perl code with different manufacturers (Honda, mercedes, Toyota) everytime and then re-running the code. Surely, there must be a way in which I can simply search for lines beginning with (without worrying about the manufacturer)
manufacturer=*whatever*
and then replace the entire line with
manufacturer=XXX
can somebody please shed some light on how to go about doing this?

The way I would write this is
s/^manufacturer=\K.+/XXX/
The ^ at the beginning makes sure the manufacturer= starts at the beginning of the string, instead of just appearing in it anywhere.
The \K (for Keep) metacharacter means to ignore all of the preceding stuff in the substitution, so the pattern matches ^manufacturer=.+ but only .+ is replaced.
Lastly, the .+ matches everything up to the end of the string or a trailing newline (. doesn't match newlines).

You need to leverage the full power of regexes:
s/^manufacturer=\w+/manufacturer=XXX/ig

A job for regular expressions:
s/manufacturer=.*/manufacturer=XXX/g
Per your comment, this matches everything between = and EOL and replaces it with XXX.
Comprehensive docs here

Related

Replace words but only after a colon

I have been researching this for quite some time but cannot seem to find an answer. Perhaps someone here can help.
I am trying to use sed to replace words in yml / yaml files. Since some of the words are included in the names I want to only replace words that appear after the colon (':').
For example. If the .yml file includes:
en:
label_some_tracker: A tracker
label_all_tracker: All trackers
label_attachment_type_trackers: Select trackers.
tracker_plural: trackers
and I want to replace all occurrences of tracker with issue in all values. The pattern:
s/tracker/issue/
also changes the names of the fields, which breaks my code.
I can reduce the size of the problem somewhat by including terms for all possible variants of a word. For example:
s/trackers/issues/
s/tracker/issue/
but that doesn't deal with all situations.
I have tried inserting a space before the search term:
s/ tracker/ issue/
but that matches names where the search term is at the beginning of the line.
If I search for whole words then it still seems to pick up the names because ':' and '_' are 'non word' characters.
If I try to put spaces at the beginning and end of the search term but then it misses words that are at the end of a line or words patterns with punctuation marks before the training space.
The only sure way seems to be to only replace words after a colon (':') but I cannot seem to figure out how to do that with sed.
Does anyone here know how?
With GNU sed:
sed -E 's/(:.*)tracker/\1issue/g' file
Output:
en:
label_some_tracker: A issue
label_all_tracker: All issues
label_attachment_type_trackers: Select issues.
tracker_plural: issues
Replace second occurance:
sed 's/tracker/issue/2' file

How do I tell org-mode to disable headings in verbatim text?

How do I make org mode not interpret a line that begins with an asterisk as a headline? I have some verbatim text in my org mode document. Some of the lines begin with an asterisk. Org mode interprets these lines as headlines. I don't want that.
Here is the text with some context:
* 20160721 Headline for July 21, 2016
I created a git repository for rfc-tools. It's in
~/Documents/rfc-tools.
Renamed grep-rfc-index.sh to search-rfc-index.sh because it searches.
That it uses grep is irrelevant.
Wrote a README.md for the project. Here it is:
#+BEGIN_SRC text
----- BEGIN QUOTED TEXT -----
This is the README.md for rfc-tools, a collection of programs for
processing IETF RFCs.
* fetch-rfcs-by-title.sh downloads into the current directory the RFCs
whose titles contain the string given on the command line. Uses an
rfc-index file in the current directory. Prefers the PDF version of
RFCs but will obtain the text version if the PDF is not available.
* fetch-sip-rfcs.sh downloads RFCs that contain "Session Initiation"
in their titles into the current directory.
* search-rfc-index.sh searches an rfc-index file in the current
directory for the string given on the command line. The string can
contain spaces.
* join-titles.awk turns the contents of an rfc-index file into a
series of long lines. Each line begins with the RFC number, then a
space, then the rest of the entry from the rfc-index.
----- END QUOTED TEXT -----
#+END_SRC
I want the lines between "----- BEGIN QUOTED TEXT -----" and "----- END QUOTED TEXT -----" to be plain text and subordinate to the headline "20160721 Headline for July 21, 2016". Org mode interprets all lines that begin with an asterisk as top-level headlines.
By the way, the verbatim text is Markdown. I hope that doesn't matter.
worked for me:
#+BEGIN_SRC markdown
Try wrapping your text in one of the various special block tags. For example you could try putting your text inside these tags:
#+BEGIN_SRC text
...
#+END_SRC
Here is a screenshot of how the formatting turns out on my Emacs:
If that doesn't meet your needs, you could try:
#+BEGIN_EXAMPLE
...
#+END_EXAMPLE
Which will render everything inside the tags without markup and in a monospace font.
If that doesn't work either, you could try one of the other kinds of tags listed here.
Escape the * with a comma like this,*
Probably if you type C-c ' to enter a special edit and then exit, org will do that for you.
I think the answer is "You can't do that". I found a way to work around the problem using drawers. The org-mode manual explains that a drawer is a place to put text that you don't want to see all of the time.
A StackExchange user had a question about
getting a custom org drawer to open/close. It seems that for older versions of org-mode, you must tell org-mode the names of your drawers. E.g. If you have a drawer named "COMMANDS"
:COMMANDS:
ls
cat
grep
:END:
you must tell org-mode the name of the drawer using the +DRAWERS keyword:
#+DRAWERS COMMAND
and restart org-mode.
I found a solution:
Escape Character
You may sometimes want to write text that looks like Org syntax, but should really read as plain text. Org may use a specific escape character in some situations, i.e., a backslash in macros (see Macro Replacement) and links (see Link Format), or a comma in source and example blocks (see Literal Examples). In the general case, however, we suggest to use the zero width space. You can insert one with any of the following:
C-x 8 zero width space
C-x 8 200B
For example, in order to write ‘[[1,2]]’ as-is in your document, you may write instead
[X[1,2]]
where ‘X’ denotes the zero width space character.
How to remove zero width space:
sed -i "s/$(echo -ne '\u200b')//g" abc.txt

Why does Github Flavored Markup only add newlines for lines that start with [\w\<]?

In our site (which is aimed at highly non-technical people), we let them use Markdown when sending emails. That way, they get nice things like bold, italic, etc. Being non-technical, however, they would never get past the “add two lines to make newlines actually work” quirk.
For that reason mainly, we are using a variant of Github Flavored Markdown.
We mainly borrowed this part:
# in very clear cases, let newlines become <br /> tags
text.gsub!(/^[\w\<][^\n]*\n+/) do |x|
x =~ /\n{2}/ ? x : (x.strip!; x << " \n")
end
This works well, but in some cases it doesn’t add the new-lines, and I guess the key to that is the “in very clear cases” part of that comment.
If I interpret it correctly, this is only adding newlines for lines that start with either a word character or a ‘<’.
Does anyone know why that is? Particularly, why ‘<’?
What would be the harm in just adding the two spaces to essentially anything (lines starting with spaces, hyphens, anything)?
'<' character is used at the beginning of a line to quote messages. I guess that is the reason.
The other answer to this question is quite wrong. This has nothing to do with quoting, and the character for markdown quoting is >.
^[\w\<][^\n]*\n+
Let's break the above regex into parts:
^ = anchor start of string.
[\w\<] matches a word character or the start of word boundary. \< is not a literal, but rather a GNU word boundary. See here (do a ctrl+f for \<).
[^\n]* matches any length of non-newline characters
\n matches a new line.
+ is, I believe, a possessive quantifier.
I believe, but am not 100% sure, that this simply is used to set x to a line of text. Then, the heavy work is done with the next line:
x =~ /\n{2}/ ? x : (x.strip!; x << " \n")
This says "if x satisfies the regex \n{2} (that is, has two line breaks), leave x as is. Otherwise, strip x and append a newline character.

How to use '^#' in Vim scripts?

I'm trying to work around a problem with using ^# (i.e., <ctrl-#>) characters in Vim scripts. I can insert them into a script, but when the script runs it seems the line is truncated at the point where a ^# was located.
My kludgy solution so far is to have a ^# stored in a variable, then reference the variable in the script whenever I would have quoted a literal ^#. Can someone tell me what's going on here? Is there a better way around this problem?
That is one reason why I never use raw special character values in scripts. While ^# does not work, string <C-#> in mappings works as expected, so you may use one of
nnoremap <C-#> {rhs}
nnoremap <Nul> {rhs}
It is strange, but you cannot use <Char-0x0> here. Some notes about null byte in strings:
Inserting null byte into string truncates it: vim uses old C-style strigs that end with null byte, thus it cannot appear in strings. These strings are very inefficient, so if you want to generate a very large text, try accumulating it into a list of lines (using setline is very fast as buffer is represented as a list of lines).
Most functions that return list of strings (like readfile, getline(start, end)) or take list of strings (like writefile, setline, append) treat \n (NL) as Null. It is also the internal representation of buffer lines, see :h NL-used-for-Nul.
If you try to insert \n character into the command-line, you will get Null shown (but this is really a newline). If you want to edit a file that has \n in a filename (it is possible on *nix), you will need to prepend newline with backslash.
The byte ctrl-# is also known as '\0'. Many languages, programs, etc. use it as an "end of string" marker, so it's not surprising that vim gets confused there. If you must use this byte in the middle of a script string, it sounds like your workaround is a decent one.

Removing a trailing Space from Regex Matched group

I'm using regular expression lib icucore via RegKit on the iPhone to
replace a pattern in a large string.
The Pattern i'm looking for looks some thing like this
| hello world (P1)|
I'm matching this pattern with the following regular expression
\|((\w*|.| )+)\((\w\d+)\)\|
This transforms the input string into 3 groups when a match is found, of which group 1(string) and group 3(string in parentheses) are of interest to me.
I'm converting these formated strings into html links so the above would be transformed into
Hello world
My problem is the trailing space in the third group. Which when the link is highlighted and underlined, results with the line extending beyond the printed characters.
While i know i could extract all the matches and process them manually, using the search and replace feature of the icu lib is a much cleaner solution, and i would rather not do that as a result.
Many thanks as always
Would the following work as an alternate regular expression?
\|((\w*|.| )+)\s+\((\w\d+)\)\| Where inserting the extra \s+ pulls the space outside the 1st grouping.
Though, given your example & regex, I'm not sure why you don't just do:
\|(.+)\s+\((\w\d+)\)\|
Which will have the same effect. However, both your original regex and my simpler one would both fail, however on:
| hello world (P1)| and on the same line | howdy world (P1)|
where it would roll it up into 1 match.
\|\s*([\w ,.-]+)\s+\((\w\d+)\)\|
will put the trailing space(s) outside the capturing group. This will of course only work if there always is a space. Can you guarantee that?
If not, use
\|\s*([\w ,.-]+(?<!\s))\s*\((\w\d+)\)\|
This uses a lookbehind assertion to make sure the capturing group ends in a non-space character.