How to comment a VIM macro - macros

Is it possible to comment a macro and replay it.
Example
instead of
dddwj
I would like to comment and execute following fragment
dd # Delete line
dw # Delete word
j # Move to next line
Some background
We use PICT to generate testcase inputs (All Pair testing). As this is an iterative process, the macro for generating code needs tweaking between subsequent runs. It's hard to modify a macro when everything is on one line, without comments.
The output of a PICT run might be something like this:
1 cInstallationX Pu380
2 cInstallationY U400
wich can be converted to testcases with a macro
procedure TWatchIntegrationTests.Test1;
begin
//***** Setup
builder
.withInstallation(cInstallationX)
.withIsotope(Pu380)
.Build;
//***** Execute
CreateAndCollectWatches;
//***** Verify
VerifyThat
.toDo;
end;
procedure TWatchIntegrationTests.Test2;
begin
//***** Setup
builder
.withInstallation(cInstallationY)
.withIsotope(U400)
.Build;
//***** Execute
CreateAndCollectWatches;
//***** Verify
VerifyThat
.toDo;
end;

I don't know a good way of doing this with macros, but there are a few options that I can see that might help:
Heavy use of 'normal'
This is the closest to your macro option, but not very nice: make your saved file look like this:
" Delete line
normal dd
" Delete word
normal dw
" Move to next line
normal j
Complicated Substitution
This makes use of regular expressions, but makes those regular expressions be well commented (this is based on your actual example).
let pattern = '^' " Start of line
let pattern .= '\(\d\+\)' " One or more digits (test number)
let pattern .= '\s\+' " Space or tab as delimiter
let pattern .= '\(\k\+\)' " Installation name
let pattern .= '\s\+' " Space or tab as delimiter
let pattern .= '\(\a\+\d\+\)' " One or more alphabetic characters, then one or more spaces (isotope)
let pattern .= '\s*$' " Any spaces up to the end of the line
let result = 'procedure TWatchIntegrationTests.Test\1;\r'
let result .= 'begin\r'
let result .= ' //***** Setup\r'
let result .= ' builder\r'
let result .= ' .withInstallation(\2)\r'
let result .= ' .withIsotope(\3)\r'
let result .= ' .Build;\r'
let result .= '\r'
let result .= ' //***** Execute\r'
let result .= ' CreateAndCollectWatches;\r'
let result .= '\r'
let result .= ' //***** Verify\r'
let result .= ' VerifyThat\r'
let result .= ' .toDo;\r'
let result .= 'end;\r'
exe '%s!' . pattern . '!' . result . '!'
Stick it in a function
Given that this is getting rather complicated, I'd probably do it this way as it gives more room for adjustment. As I see it, you want to split the line on white space and use the three fields, so something like this:
" A command to make it easier to call
" (e.g. :ConvertPICTData or :'<,'>ConvertPICTData)
command! -range=% ConvertPICTData <line1>,<line2>call ConvertPICTData()
" Function that does the work
function! ConvertPICTData() range
" List of lines producing the required template
let template = [
\ 'procedure TWatchIntegrationTests.Test{TestNumber};',
\ 'begin',
\ ' //***** Setup',
\ ' builder',
\ ' .withInstallation({Installation})',
\ ' .withIsotope({Isotope})',
\ ' .Build;',
\ '',
\ ' //***** Execute',
\ ' CreateAndCollectWatches;',
\ '',
\ ' //***** Verify',
\ ' VerifyThat',
\ ' .toDo;',
\ 'end;',
\ '']
" For each line in the provided range (default, the whole file)
for linenr in range(a:firstline,a:lastline)
" Copy the template for this entry
let this_entry = template[:]
" Get the line and split it on whitespace
let line = getline(linenr)
let parts = split(line, '\s\+')
" Make a dictionary from the entries in the line.
" The keys in the dictionary match the bits inside
" the { and } in the template.
let lookup = {'TestNumber': parts[0],
\ 'Installation': parts[1],
\ 'Isotope': parts[2]}
" Iterate through this copy of the template and
" substitute the {..} bits with the contents of
" the dictionary
for template_line in range(len(this_entry))
let this_entry[template_line] =
\ substitute(this_entry[template_line],
\ '{\(\k\+\)}',
\ '\=lookup[submatch(1)]', 'g')
endfor
" Add the filled-in template to the end of the range
call append(a:lastline, this_entry)
endfor
" Now remove the original lines
exe a:firstline.','.a:lastline.'d'
endfunction
Do it in python
This is the sort of task that is probably easier to do in python:
import sys
template = '''
procedure TWatchIntegrationTests.Test%(TestNumber)s;
begin
//***** Setup
builder
.withInstallation(%(Installation)s)
.withIsotope(%(Isotope)s)
.Build;
//***** Execute
CreateAndCollectWatches;
//***** Verify
VerifyThat
.toDo;
end;
'''
input_file = sys.argv[1]
output_file = input_file + '.output'
keys = ['TestNumber', 'Installation', 'Isotope']
fhIn = open(input_file, 'r')
fhOut = open(output_file, 'w')
for line in fhIn:
parts = line.split(' ')
if len(parts) == len(keys):
fhOut.write(template % dict(zip(keys, parts)))
fhIn.close()
fhOut.close()
To use this, save it as (e.g.) pict_convert.py and run:
python pict_convert.py input_file.txt
It will produce input_file.txt.output as a result.

First of all let me point out that #Al has posted several excellent solutions and I suggest you use those and not what I am about to post. Especially since that does not seem to work under all circumstances (for reasons I do not understand).
Having said that, the following seems to do what you want at least in this case. It assumes <Space> in normal mode is not used to move the cursor around. Maps it to :" where " is the comment character for cmline mode. Which means <Space> is the character that starts a comment in this case. The newline at the end stops the comment. The # is just there to make it clearer we are dealing with comments. (^[ should be entered as a single escape character).
:nmap <Space> :"
iHallo wereld^[ # Insert text (in dutch, better change that)
Fe # Move backwards to e
x # Delete
; # Move to next e
ro # Change to o
Fa # Move backwards to a
re # Change to e
A!^[ # Add exclamation mark

Related

Can somebody explain this obfuscated perl regexp script?

This code is taken from the HackBack DIY guide to rob banks by Phineas Fisher. It outputs a long text (The Sixth Declaration of the Lacandon Jungle). Where does it fetch it? I don't see any alphanumeric characters at all. What is going on here? And what does the -r switch do? It seems undocumented.
perl -Mre=eval <<\EOF
''
=~(
'(?'
.'{'.(
'`'|'%'
).("\["^
'-').('`'|
'!').("\`"|
',').'"(\\$'
.':=`'.(('`')|
'#').('['^'.').
('['^')').("\`"|
',').('{'^'[').'-'.('['^'(').('{'^'[').('`'|'(').('['^'/').('['^'/').(
'['^'+').('['^'(').'://'.('`'|'%').('`'|'.').('`'|',').('`'|'!').("\`"|
'#').('`'|'%').('['^'!').('`'|'!').('['^'+').('`'|'!').('['^"\/").(
'`'|')').('['^'(').('['^'/').('`'|'!').'.'.('`'|'%').('['^'!')
.('`'|',').('`'|'.').'.'.('`'|'/').('['^')').('`'|"\'").
'.'.('`'|'-').('['^'#').'/'.('['^'(').('`'|('$')).(
'['^'(').('`'|',').'-'.('`'|'%').('['^('(')).
'/`)=~'.('['^'(').'|</'.('['^'+').'>|\\'
.'\\'.('`'|'.').'|'.('`'|"'").';'.
'\\$:=~'.('['^'(').'/<.*?>//'
.('`'|"'").';'.('['^'+').('['^
')').('`'|')').('`'|'.').(('[')^
'/').('{'^'[').'\\$:=~/('.(('{')^
'(').('`'^'%').('{'^'#').('{'^'/')
.('`'^'!').'.*?'.('`'^'-').('`'|'%')
.('['^'#').("\`"| ')').('`'|'#').(
'`'|'!').('`'| '.').('`'|'/')
.'..)/'.('[' ^'(').'"})')
;$:="\."^ '~';$~='#'
|'(';$^= ')'^'[';
$/='`' |'.';
$,= '('
EOF
The basic idea of the code you posted is that each alphanumeric character has been replaced by a bitwise operation between two non-alphanumeric characters. For instance,
'`'|'%'
(5th line of the "star" in your code)
Is a bitwise or between backquote and modulo, whose codepoints are respectively 96 and 37, whose "or" is 101, which is the codepoint of the letter "e". The following few lines all print the same thing:
say '`' | '%' ;
say chr( ord('`' | '%') );
say chr( ord('`') | ord('%') );
say chr( 96 | 37 );
say chr( 101 );
say "e"
Your code starts with (ignore whitespaces which don't matter):
'' =~ (
The corresponding closing bracket is 28 lines later:
^'(').'"})')
(C-f this pattern to see it on the web-page; I used my editor's matching parenthesis highlighting to find it)
We can assign everything in between the opening and closing parenthesis to a variable that we can then print:
$x = '(?'
.'{'.(
'`'|'%'
).("\["^
'-').('`'|
'!').("\`"|
',').'"(\\$'
.':=`'.(('`')|
'#').('['^'.').
('['^')').("\`"|
',').('{'^'[').'-'.('['^'(').('{'^'[').('`'|'(').('['^'/').('['^'/').(
'['^'+').('['^'(').'://'.('`'|'%').('`'|'.').('`'|',').('`'|'!').("\`"|
'#').('`'|'%').('['^'!').('`'|'!').('['^'+').('`'|'!').('['^"\/").(
'`'|')').('['^'(').('['^'/').('`'|'!').'.'.('`'|'%').('['^'!')
.('`'|',').('`'|'.').'.'.('`'|'/').('['^')').('`'|"\'").
'.'.('`'|'-').('['^'#').'/'.('['^'(').('`'|('$')).(
'['^'(').('`'|',').'-'.('`'|'%').('['^('(')).
'/`)=~'.('['^'(').'|</'.('['^'+').'>|\\'
.'\\'.('`'|'.').'|'.('`'|"'").';'.
'\\$:=~'.('['^'(').'/<.*?>//'
.('`'|"'").';'.('['^'+').('['^
')').('`'|')').('`'|'.').(('[')^
'/').('{'^'[').'\\$:=~/('.(('{')^
'(').('`'^'%').('{'^'#').('{'^'/')
.('`'^'!').'.*?'.('`'^'-').('`'|'%')
.('['^'#').("\`"| ')').('`'|'#').(
'`'|'!').('`'| '.').('`'|'/')
.'..)/'.('[' ^'(').'"})';
print $x;
This will print:
(?{eval"(\$:=`curl -s https://enlacezapatista.ezln.org.mx/sdsl-es/`)=~s|</p>|\\n|g;\$:=~s/<.*?>//g;print \$:=~/(SEXTA.*?Mexicano..)/s"})
The remaining of the code is a bunch of assignments into some variables; probably here only to complete the pattern: the end of the star is:
$:="\."^'~';
$~='#'|'(';
$^=')'^'[';
$/='`'|'.';
$,='(';
Which just assigns simple one-character strings to some variables.
Back to the main code:
(?{eval"(\$:=`curl -s https://enlacezapatista.ezln.org.mx/sdsl-es/`)=~s|</p>|\\n|g;\$:=~s/<.*?>//g;print \$:=~/(SEXTA.*?Mexicano..)/s"})
This code is inside a regext which is matched against an empty string (don't forget that we had first '' =~ (...)). (?{...}) inside a regex runs the code in the .... With some whitespaces, and removing the string within the eval, this gives us:
# fetch an url from the web using curl _quitely_ (-s)
($: = `curl -s https://enlacezapatista.ezln.org.mx/sdsl-es/`)
# replace end of paragraphs with newlines in the HTML fetched
=~ s|</p>|\n|g;
# Remove all HTML tags
$: =~ s/<.*?>//g;
# Print everything between SEXTA and Mexicano (+2 chars)
print $: =~ /(SEXTA.*?Mexicano..)/s
You can automate this unobfuscation process by using B::Deparse: running
perl -MO=Deparse yourcode.pl
Will produce something like:
'' =~ m[(?{eval"(\$:=`curl -s https://enlacezapatista.ezln.org.mx/sdsl-es/`)=~s|</p>|\\n|g;\$:=~s/<.*?>//g;print \$:=~/(SEXTA.*?Mexicano..)/s"})];
$: = 'P';
$~ = 'h';
$^ = 'r';
$/ = 'n';
$, = '(';

Parse single quoted string using Marpa:r2 perl

How to parse single quoted string using Marpa:r2?
In my below code, the single quoted strings appends '\' on parsing.
Code:
use strict;
use Marpa::R2;
use Data::Dumper;
my $grammar = Marpa::R2::Scanless::G->new(
{ default_action => '[values]',
source => \(<<'END_OF_SOURCE'),
lexeme default = latm => 1
:start ::= Expression
# include begin
Expression ::= Param
Param ::= Unquoted
| ('"') Quoted ('"')
| (') Quoted (')
:discard ~ whitespace
whitespace ~ [\s]+
Unquoted ~ [^\s\/\(\),&:\"~]+
Quoted ~ [^\s&:\"~]+
END_OF_SOURCE
});
my $input1 = 'foo';
#my $input2 = '"foo"';
#my $input3 = '\'foo\'';
my $recce = Marpa::R2::Scanless::R->new({ grammar => $grammar });
print "Trying to parse:\n$input1\n\n";
$recce->read(\$input1);
my $value_ref = ${$recce->value};
print "Output:\n".Dumper($value_ref);
Output's:
Trying to parse:
foo
Output:
$VAR1 = [
[
'foo'
]
];
Trying to parse:
"foo"
Output:
$VAR1 = [
[
'foo'
]
];
Trying to parse:
'foo'
Output:
$VAR1 = [
[
'\'foo\''
]
]; (don't want it to be parsed like this)
Above are the outputs of all the inputs, i don't want 3rd one to get appended with the '\' and single quotes.. I want it to be parsed like OUTPUT2. Please advise.
Ideally, it should just pick the content between single quotes according to Param ::= (') Quoted (')
The other answer regarding Data::Dumper output is correct. However, your grammar does not work the way you expect it to.
When you parse the input 'foo', Marpa will consider the three Param alternatives. The predicted lexemes at that position are:
Unquoted ~ [^\s\/\(\),&:\"~]+
'"'
') Quoted ('
Yes, the last is literally ) Quoted (, not anything containing a single quote.
Even if it were ([']) Quoted ([']): Due to longest token matching, the Unquoted lexeme will match the entire input, including the single quote.
What would happen for an input like " foo " (with double quotes)? Now, only the '"' lexeme would match, then any whitespace would be discarded, then the Quoted lexeme matches, then any whitespace is discarded, then closing " is matched.
To prevent this whitespace-skipping behaviour and to prevent the Unquoted rule from being preferred due to LATM, it makes sense to describe quoted strings as lexemes. For example:
Param ::= Unquoted | Quoted
Unquoted ~ [^'"]+
Quoted ~ DQ | SQ
DQ ~ '"' DQ_Body '"' DQ_Body ~ [^"]*
SQ ~ ['] SQ_Body ['] SQ_Body ~ [^']*
These lexemes will then include any quotes and escapes, so you need to post-process the lexeme contents. You can either do this using the event system (which is conceptually clean, but a bit cumbersome to implement), or adding an action that performs this processing during parse evaluation.
Since lexemes cannot have actions, it is usually best to add a proxy production:
Param ::= Unquoted | Quoted
Unquoted ~ [^'"]+
Quoted ::= Quoted_Lexeme action => process_quoted
Quoted_Lexeme ~ DQ | SQ
DQ ~ '"' DQ_Body '"' DQ_Body ~ [^"]*
SQ ~ ['] SQ_Body ['] SQ_Body ~ [^']*
The action could then do something like:
sub process_quoted {
my (undef, $s) = #_;
# remove delimiters from double-quoted string
return $1 if $s =~ /^"(.*)"$/s;
# remove delimiters from single-quoted string
return $1 if $s =~ /^'(.*)'$/s;
die "String was not delimited with single or double quotes";
}
Your result doesn't contain \', it contains '. Dumper merely formats the result like that so it's clear what's inside the string and what isn't.
You can test this behavior for yourself:
use Data::Dumper;
my $tick = chr(39);
my $back = chr(92);
print "Tick Dumper: " . Dumper($tick);
print "Tick Print: " . $tick . "\n";
print "Backslash Dumper: " . Dumper($back);
print "Backslash Print: " . $back . "\n";
You can see a demo here: https://ideone.com/d1V8OE
If you don't want the output to contain single quotes, you'll probably need to remove them from the input yourself.
I am not so familar with Marpa::R2, but could you try to use an action on the Expression rule:
Expression ::= Param action => strip_quotes
Then, implement a simple quote stripper like:
sub MyActions::strip_quotes {
#{$_[1]}[0] =~ s/^'|'$//gr;
}

encode special character in html entities in perl

I have a string where special characters like ! or " or & or # or #, ... can appear. How can I convert in the string
str = " Hello "XYZ" this 'is' a test & so *n #."
automatically every special characters with their html entities, so that I get this:
str = " Hello &quot ;XYZ&quot ; this &#39 ;is&#39 ; a test &amp ; so on #"
I tried
$str=HTML::Entities::encode_entities($str);
but it does a partial work the # is not transformed in &#64 ;
SOLUTION:
1) with your help (Quentin and vol7ron) I came up with this solution(1)
$HTML::Entities::char2entity{'#'} = '#';
$HTML::Entities::char2entity{'!'} = '!';
$HTML::Entities::char2entity{'#'} = '#';
$HTML::Entities::char2entity{'%'} = '%';
$HTML::Entities::char2entity{'.'} = '.';
$HTML::Entities::char2entity{'*'} = '*';
$str=HTML::Entities::encode_entities($str, q{#"%'.&#*$^!});
2) and I found a shorter(better) solution(2) found it here:
$str=HTML::Entities::encode_entities($str, '\W');
the '\W' does the job
#von7ron with solution(1) you will need to specify the characters you want to translate as Quentin mentioned earlier even if they are on the translation table.
# isn't transformed because it isn't considered to be a "special character". It can be represented in ASCII and has no significant meaning in HTML.
You can expand the range of characters that are converted with the second argument to the function you are using, as described in the documentation.
You can manually add a character to the translation table (char2entity hash).
$HTML::Entities::char2entity{'#'} = '#';
my $str = q{ Hello "XYZ" this 'is' a test & so on #};
my $encoded = HTML::Entities::encode_entities( $str, q{<>&"'#} );
The above adds #, which will be translated to #.
You then need to specify the characters you want to translate, if you don't it uses <>&", so I added both # and '. Notice, I didn't have to add the ' to the translation table, because it's already there by default.
You don't need to add ASCII characters (0-255) to the char2entity hash, since the module will do it automatically.
Note: Setting the char2entity for #, was done as an example. The module automatically sets numerical entities for ASCII characters (0-255) that weren't found. You'd have to use it for unicode characters, though.
Cheap, dirty, and ugly, but works:
my %translations;
$translations{'"'} = '&quot ;';
$translations{'\''} = '&#39 ;';
etc...
sub transform()
{
my $str = shift;
foreach my $character (keys(%translations))
{
$str =~ s/$character/$translations{$character}/g;
}
return $str;
}

Copying a string(passed as command line arguments to Perl) into text file

I have a string containing lots of text with white-spaces like:
String str = "abc xyz def";
I am now passing this string as a command line argument to a perl file using C# as in:
Process p = new Process();
p.StartInfo.FileName = "c:\\perl\\bin\\perl.exe";
p.StartInfo.Arguments = "c:\\root\\run_cmd.pl " + str + " " + text_file;
In the run_cmd.pl file, I have the follwing:
open FILE, ">$ARGV[1]" or die "Failed opening file";
print FILE $ARGV[0];
close FILE;
On printing, I am able to copy only part of the string i.e. "abc" into text_file since Perl interprets it as a single argument.
My question is, is it possible for me to copy the entire string into the text file including the white spaces?
If you want a white space separated argument treated as a single argument, with most programs, you need to surround it with " "
e.g run_cmd.pl "abc xyz def" filename
Try
p.StartInfo.Arguments = "c:\\root\\run_cmd.pl \"" + str + "\" " + text_file;
Side note:
I don't know about windows, but in Linux there's a number of arguments and maximum length of one argument limit so you might want to consider passing the string some other way, reading it from a tmp file for example.
It's a little bit of a hack, but
$ARGV[$#ARGV]
would be the last item in #ARGV, and
#ARGV[0 .. ($#ARGV - 1)]
would be everything before that.
It's not perl -- it's your shell. You need to put quotes around the arguments:
p.StartInfo.Arguments = "c:\\root\\run_cmd.pl '" + str + "' " + text_file;
If text_file comes from user input, you'll likely want to quote that, too.
(You'll also need to escape any existing quotes in str or text_file; I'm not sure what the proper way to escape a quote in Windows is)
#meidwar said: "you might want to consider passing the string some other way, reading it from a tmp file for example"
I'll suggest you look into a piped-open. See http://search.cpan.org/~jhi/perl-5.8.0/pod/perlopentut.pod#Pipe_Opens and http://perldoc.perl.org/perlipc.html#Using-open()-for-IPC
These let you send as much data as your called code can handle and are not subject to limitations of the OS's command-line.

How do I repeat a character n times in a string?

I am learning Perl, so please bear with me for this noob question.
How do I repeat a character n times in a string?
I want to do something like below:
$numOfChar = 10;
s/^\s*(.*)/' ' x $numOfChar$1/;
By default, substitutions take a string as the part to substitute. To execute code in the substitution process you have to use the e flag.
$numOfChar = 10;
s/^(.*)/' ' x $numOfChar . $1/e;
This will add $numOfChar space to the start of your text. To do it for every line in the text either use the -p flag (for quick, one-line processing):
cat foo.txt | perl -p -e "$n = 10; s/^(.*)/' ' x $n . $1/e/" > bar.txt
or if it's a part of a larger script use the -g and -m flags (-g for global, i.e. repeated substitution and -m to make ^ match at the start of each line):
$n = 10;
$text =~ s/^(.*)/' ' x $n . $1/mge;
Your regular expression can be written as:
$numOfChar = 10;
s/^(.*)/(' ' x $numOfChar).$1/e;
but - you can do it with:
s/^/' ' x $numOfChar/e;
Or without using regexps at all:
$_ = ( ' ' x $numOfChar ) . $_;
You're right. Perl's x operator repeats a string a number of times.
print "test\n" x 10; # prints 10 lines of "test"
EDIT: To do this inside a regular expression, it would probably be best (a.k.a. most maintainer friendly) to just assign the value to another variable.
my $spaces = " " x 10;
s/^\s*(.*)/$spaces$1/;
There are ways to do it without an extra variable, but it's just my $0.02 that it'll be easier to maintain if you do it this way.
EDIT: I fixed my regex. Sorry I didn't read it right the first time.