Print strings alongside math results in s// substitution with the /e modifier - perl

I am trying to write a very simple one liner to find cases of:
foo N
and replace them with
foo N-Y
For example, if I had 3 files and they had the following lines in them:
foo 5
foo 3
foo 9
After the script is run with Y=4, the lines would read:
foo 1
foo -1
foo 5
I stumbled upon an existing thread that suggested using /e to run code in the replace half of the substitute command and was able to effectively subtract Y from all my matches, but I have no idea how to best print "foo" back into the file since when I try to separate foo and the number into two capture groups and print them back in, perl thinks I am trying to multiply them and wants an operator.
Here's where I'm at:
find . -iname "*somematch*" -exec perl -pi -e 's/(Foo *)(\d+)/$1$2-4/e' {} \;
Of course this doesn't work, "Scalar found where operator expected at -e line 1, near "$1$2." I'm at a loss as to how best to proceed without writing something much longer.
Edit: To be more specific, if I have the /e option enabled to be able to perform math in the substitution, is there a simple way to print the string in another capture group in that substitution without it trying to do math to it?
Alternatively, is there a simple way to surgically perform the substitution on only part of the pattern? I tried to combine m// and s/// to achieve the results but ended up getting nowhere.

The replacement part is treated as code under /e so it need be written using legal syntax, like you'd use in a program. Writing $t$v isn't legal syntax ($1$2 in your regex).
One way to concatenate strings is $t . $v. Then you also need parenthesis around the addition, since by precedence rules the strings $1 and $2 are concatenated first, and that alphanumeric string attempted in addition, drawing a warning. So
perl -i -pe's/(Foo *)([0-9]+)/$1.($2-4)/e'
I replaced \d with [0-9] since \d matches all kinds of "digits," from all over Unicode, what doesn't seem to be what you need.
There is another way if the math comes after the rest of the pattern, as it does in your examples
perl -i -pe's/Foo *\K([0-9]+)/$1-4/e'
Here the \K is a form of positive lookbehind which drops all matches previous to that anchor, so they are not consumed. Thus only the [0-9]+ is replaced, as needed.

Related

Extracting substring from inside bracketed string, where the substring may have spaces

I've got an application that has no useful api implemented, and the only way to get certain information is to parse string output. This is proving to be very painful...
I'm trying to achieve this in bash on SLES12.
Given I have the following strings:
QMNAME(QMTKGW01) STATUS(Running)
QMNAME(QMTKGW01) STATUS(Ended normally)
I want to extract the STATUS value, ie "Ended normally" or "Running".
Note that the line structure can move around, so I can't count on the "STATUS" being the second field.
The closest I have managed to get so far is to extract a single word from inside STATUS like so
echo "QMNAME(QMTKGW01) STATUS(Running)" | sed "s/^.*STATUS(\(\S*\)).*/\1/"
This works for "Running" but not for "Ended normally"
I've tried switching the \S* for [\S\s]* in both "grep -o" and "sed" but it seems to corrupt the entire regex.
This is purely a regex issue, by doing \S you requested to match non-white space characters within (..) but the failing case has a space between which does not comply with the grammar defined. Make it simple by explicitly calling out the characters to match inside (..) as [a-zA-Z ]* i.e. zero or more upper & lower case characters and spaces.
sed 's/^.*STATUS(\([a-zA-Z ]*\)).*/\1/'
Or use character classes [:alnum:] if you want numbers too
sed 's/^.*STATUS(\([[:alnum:] ]*\)).*/\1/'
sed 's/.*STATUS(\([^)]*\)).*/\1/' file
Output:
Running
Ended normally
Extracting a substring matching a given pattern is a job for grep, not sed. We should use sed when we must edit the input string. (A lot of people use sed and even awk just to extract substrings, but that's wasteful in my opinion.)
So, here is a grep solution. We need to make some assumptions (in any solution) about your input - some are easy to relax, others are not. In your example the word STATUS is always capitalized, and it is immediately followed by the opening parenthesis (no space, no colon etc.). These assumptions can be relaxed easily. More importantly, and not easy to work around: there are no nested parentheses. You will want the longest substring of non-closing-parenthesis characters following the opening parenthesis, no mater what they are.
With these assumptions:
$ grep -oP '\bSTATUS\(\K[^)]*(?=\))' << EOF
> QMNAME(QMTKGW01) STATUS(Running)
> QMNAME(QMTKGW01) STATUS(Ended normally)
> EOF
Running
Ended normally
Explanation:
Command options: o to return only the matched substring; P to use Perl extensions (the \K marker and the lookahead). The regexp: we look for a word boundary (\b) - so the word STATUS is a complete word, not part of a longer word like SUBSTATUS; then the word STATUS and opening parenthesis. This is required for a match, but \K instructs that this part of the matched string will not be returned in the output. Then we seek zero or more non-closing-parenthesis characters ([^)]*) and we require that this be followed by a closing parenthesis - but the closing parenthesis is also not included in the returned string. That's a "lookahead" (the (?= ... ) construct).

rename multiple files possible unintended interpolation

I'm using brew rename to rename multiple files...
file-24.png => file.png
file-48.png => file#2x.png
file-72.png => file#3x.png
the first one is succeed with,,
rename 's/-24//g' *
the second and third...
rename 's/-48/#2x/g' *
and getting Possible unintended interpolation of #2 in string at (eval 2) line 1...
escaping doesnt work..
rename 's/-48/\#2x/g' *
other possible ways to rename multiple files like this case are also welcome..
I don't know what "brew rename" is, but if it uses normal regex
's/pattern/q(#replacement)/e'
This uses /e modifier to evaluate the replacement side as code, where q() operator (single quotes) is used to insert literal characters.
Another way is to use \x40 for # character
's/pattern/\x40replacement/'
or just escape it, use \# in the replacement.
This is suitable for when there's just one character to deal with, like here. if there's more than that then it's easier to single-quote the whole thing, with q() (for which we need /e flag).
Can't help it but ask -- are you certain that you want to have # in a file name? That character gets interpreted in various ways by many tools. For instance, sticking that file name in a variable in a Perl script leads to no end of trouble. Why not even simply file_at_2x.png?
This may be more of a curiousity, but if you have a lot of files you can rename them all with
's{ \-([0-9]+) }{ ($r = $1/24) > 1 && qq(_at_${r}x) || q() }ex'
This captures the number ([0-9]+) into $1. Then, it finds the ratio ($r = $1/24) and if that is >1 then (&& short-circuits) it replaces -number with _at_${r}x, otherwise (||) removes it by putting an empty string, q().
I use {}{} delimiters so that I may use / inside, and }x allows spaces inside, for readability.
Please test this carefully with (a copy of) your actual files, as always.
I know this question is old and maybe the version of rename that apt-get installs is lightly different or improved. However, escaping with a single backslash seems to work:
$ rename -n -v 's/-48/\#2x/g' *
rename(foo-48.txt, foo#2x.txt)

Please explain one line of perl code

I have such line from https://camlistore.googlesource.com/camlistore/+/master/third_party/rewrite-imports.sh
find . -type f -name '*.go' -exec perl -pi -e 's!"code.google.com/!"camlistore.org/third_party/code.google.com/!' {} \;
I would like help understanding what exactly this does:
perl -pi -e 's!"code.google.com/!"camlistore.org/third_party/code.google.com/!'
Especialy exclamation marks and ". Thanks!
From perldoc perlrun:
-p means "run the expression for each line, and print the result"
-i means "edit the input file in place"
-e means "the next parameter is the Perl expression to evaluate"
For the expression itself:
The ! marks are the separators for the s (substitution) operator. Any non-alphanumeric character can be used for that - whatever follows the s.
The " characters don't mean anything special, they're just part of the text to be replaced, and the replacement.
So we have:
s: substitute
!: (separator)
"code.google.com/: text to find
!: (separator)
"camlistore.org/third_party/code.google.com/: replacement text
!: (separator)
Which all means:
For each line in the file
Find the text "code.google.com/
And (if found) replace it with "camlistore.org/third_party/code.google.com/
The bangs ! are just an alternative delimiter for the search and replace regex s///.
Because the content of the search and replace includes forward slashes, it makes sense to use a different delimiter to avoid having to escape them all. Exclamation points are sometimes used for this purpose s!!!, but my preferred alternate are braces: s{}{}.
As for what that code is done, it's replacing all references to "code.google.com/ with "camlistore.org/third_party/code.google.com/ in the found files.
This is a pretty straightforward search-and-replace. The s/PATTERN/REPLACEMENT/ operator sees if a string matches the regular expression pattern and replaces the part that matches with the value of the replacement string.
Since sometimes / characters are an inconvenient delimiter (such as dealing with web URIs), Perl allows you to swap them out for other characters, in this case they chose to use !.
The -p switch causes Perl to assume a loop around the code in question for processing lines. The -i switch allows input lines to be edited in-place as they are processed, optionally preserving the original in another file. (See perldoc perlrun for the gory details.)
So all this code is doing is replacing lines that contain "code.google.com/ with "camlistore.org/third_party/code.google.com/.

How to restrict a find and replace to only one column within a CSV?

I have a 4-column CSV file, e.g.:
0001 # fish # animal # eats worms
I use sed to do a find and replace on the file, but I need to limit this find and replace to only the text found inside column 3.
How can I have a find and replace only occur on this one column?
Are you sure you want to be using sed? What about csvfix? Is your CSV nice and simple with no quotes or embedded commas or other nasties that make regexes...a less than satisfactory way of dealing with a general CSV file? I'm assuming that the # is the 'comma' in your format.
Consider using awk instead of sed:
awk -F# '$3 ~ /pattern/ { OFS= "#"; $3 = "replace"; }'
Arguably, you should have a BEGIN block that sets OFS once. For one line of input, it didn't make any odds (and you'd probably be hard-pressed to measure a difference on a million lines of input, too):
$ echo "pattern # pattern # pattern # pattern" |
> awk -F# '$3 ~ /pattern/ { OFS= "#"; $3 = "replace"; }'
pattern # pattern #replace# pattern
$
If sed still seems appealing, then:
sed '/^\([^#]*#[^#]*\)#pattern#\(.*\)/ s//\1#replace#\2/'
For example (and note the slightly different input and output – you can fix it to handle the same as the awk quite easily if need be):
$ echo "pattern#pattern#pattern#pattern" |
> sed '/^\([^#]*#[^#]*\)#pattern#\(.*\)/ s//\1#replace#\2/'
pattern#pattern#replace#pattern
$
The first regex looks for the start of a line, a field of non-at-signs, an at-sign, another field of non-at-signs and remembers the lot; it looks for an at-sign, the pattern (which must be in the third field since the first two fields have been matched already), another at-sign, and then the residue of the line. When the line matches, then it replaces the line with the first two fields (unchanged, as required), then adds the replacement third field, and the residue of the line (unchanged, as required).
If you need to edit rather than simply replace the third field, then you think about using awk or Perl or Python. If you are still constrained to sed, then you explore using the hold space to hold part of the line while you manipulate the other part in the pattern space, and end up re-integrating your desired output line from the hold space and pattern space before printing the line. That's nearly as messy as it sounds; actually, possibly even messier than it sounds. I'd go with Perl (because I learned it long ago and it does this sort of thing quite easily), but you can use whichever non-sed tool you like.
Perl editing the third field. Note that the default output is $_ which had to be reassembled from the auto-split fields in the array #F.
$ echo "pattern#pattern#pattern#pattern" | sh -x xxx.pl
> perl -pa -F# -e '$F[2] =~ s/\s*pat(\w\w)rn\s*/ prefix-$1-suffix /; $_ = join "#", #F; ' "$#"
pattern#pattern# prefix-te-suffix #pattern
$
An explanation. The -p means 'loop, reading lines into $_ and printing $_ at the end of each iteration'. The -a means 'auto-split $_ into the array #F'. The -F# means the field separator is #. The -e is followed by the Perl program. Arrays are indexed from 0 in Perl, so the third field is split into $F[2] (the sigil — the # or $ — changes depending on whether you're working with a value from the array or the array as a whole. The =~ is a match operator; it applies the regex on the RHS to the value on the LHS. The substitute pattern recognizes zero or more spaces \s* followed by pat then two 'word' characters which are remembered into $1, then rn and zero or more spaces again; maybe there should be a ^ and $ in there to bind to the start and end of the field. The replacement is a space, 'prefix-', the remembered pair of letters, and '-suffix' and a space. The $_ = join "#", #F; reassembles the input line $_ from the possibly modified separate fields, and then the -p prints that out. Not quite as tidy as I'd like (so there's probably a better way to do it), but it works. And you can do arbitrary transforms on arbitrary fields in Perl without much difficulty. Perl also has a module Text::CSV (and a high-speed C version, Text::CSV_XS) which can handle really complex CSV files.
Essentially break the line into three pieces, with the pattern you're looking for in the middle. Then keep the outer pieces and replace the middle.
/\([^#]*#[^#]*#\[^#]*\)pattern\([^#]*#.*\)/s//\1replacement\2/
\([^#]*#[^#]*#\[^#]*\) - gather everything before the pattern, including the 3rd # and any text before the math - this becomes \1
pattern - the thing you're looking for
\([^#]*#.*\) - gather everything after the pattern - this becomes \2
Then change that line into \1 then the replacement, then everything after pattern, which is \2
This might work for you:
echo 0001 # fish # animal # eats worms|
sed 's/#/&\n/2;s/#/\n&/3;h;s/\n#.*//;s/.*\n//;y/a/b/;G;s/\([^\n]*\)\n\([^\n]*\).*\n/\2\1/'
0001 # fish # bnimbl # eats worms
Explanation:
Define the field to be worked on (in this case the 3rd) and insert a newline (\n) before it and directly after it. s/#/&\n/2;s/#/\n&/3
Save the line in the hold space. h
Delete the fields either side s/\n#.*//;s/.*\n//
Now process the field i.e. change all a's to b's. y/a/b/
Now append the original line. G
Substitute the new field for the old field (also removing any newlines). s/\([^\n]*\)\n\([^\n]*\).*\n/\2\1/
N.B. That in step 4 the pattern space only contains the defined field, so any number of commands may be carried out here and the result will not affect the rest of the line.

How to reformat a source file to go from 2 space indentations to 3?

This question is nearly identical to this question except that I have to go to three spaces (company coding guidelines) rather than four and the accepted solution will only double the matched pattern. Here was my first attempt:
:%s/^\(\s\s\)\+/\1 /gc
But this does not work because four spaces get replaced by three. So I think that what I need is some way to get the count of how many times the pattern matched "+" and use that number to create the other side of the substitution but I feel this functionality is probably not available in Vim's regex (Let me know if you think it might be possible).
I also tried doing the substitution manually by replacing the largest indents first and then the next smaller indent until I got it all converted but this was hard to keep track of the spaces:
:%s/^ \(\S\)/ \1/gc
I could send it through Perl as it seems like Perl might have the ability to do it with its Extended Patterns. But I could not get it to work with my version of Perl. Here was my attempt with trying to count a's:
:%!perl -pe 'm<(?{ $cnt = 0 })(a(?{ local $cnt = $cnt + 1; }))*aaaa(?{ $res = $cnt })>x; print $res'
My last resort will be to write a Perl script to do the conversion but I was hoping for a more general solution in Vim so that I could reuse the idea to solve other issues in the future.
Let vim do it for you?
:set sw=3<CR>
gg=G
The first command sets the shiftwidth option, which is how much you indent by. The second line says: go to the top of the file (gg), and reindent (=) until the end of the file (G).
Of course, this depends on vim having a good formatter for the language you're using. Something might get messed up if not.
Regexp way... Safer, but less understandable:
:%s#^\(\s\s\)\+#\=repeat(' ',strlen(submatch(0))*3/2)#g
(I had to do some experimentation.)
Two points:
If the replacement starts with \=, it is evaluated as an expression.
You can use many things instead of /, so / is available for division.
The perl version you asked for...
From the command line (edits in-place, no backup):
bash$ perl -pi -e 's{^((?: )+)}{" " x (length($1)/2)}e' YOUR_FILE
(in-place, original backed up to "YOUR_FILE.bak"):
bash$ perl -pi.bak -e 's{^((?: )+)}{" " x (length($1)/2)}e' YOUR_FILE
From vim while editing YOUR_FILE:
:%!perl -pe 's{^((?: )+)}{" " x (length($1)/2)}e'
The regex matches the beginning of the line, followed by (the captured set of) one or more "two space" groups. The substitution pattern is a perl expression (hence the 'e' modifier) which counts the number of "two space" groups that were captured and creates a string of that same number of "three space" groups. If an "extra" space was present in the original it is preserved after the substitution. So if you had three spaces before, you'll have four after, five before will turn into seven after, etc.