Regex to convert JSDoc from coffeescript to javascript - perl

I'm trying to convert a project from coffeescript to javascript. I've successfully used the decaffeinate tool but something remains boring to convert manually: JSdoc comments
In coffescript we took the habit to write them like using the formalism of this example:
##
# This will remove the given file from the machine attachments list. If the file was previously uploaded
# to the server, it will be marked for deletion on the server. Otherwise, it will be simply truncated from
# the attachments array.
# #param file {Object} the file to delete
# #returns {boolean}
##
After using decaffeinate, they were transformed to something like that:
// #
// This will remove the given file from the machine attachments list. If the file was previously uploaded
// to the server, it will be marked for deletion on the server. Otherwise, it will be simply truncated from
// the attachments array.
// #param file {Object} the file to delete
// #returns {boolean}
// #
So I've tried to write a perl regex to make them look like the standard JSdoc syntax. But I'm stuck with the central lines: I can't find to way to put a star at the beginning of each lines ... Here's the best I've came to:
find . -type f -name "*.js" | xargs perl -0777 -i -pe 's~// #\n( +// (.+\n)+)( +)// #~/**\n$1$3 */~gm;'
Which results in:
/**
// This will remove the given file from the machine attachments list. If the file was previously uploaded
// to the server, it will be marked for deletion on the server. Otherwise, it will be simply truncated from
// the attachments array.
// #param file {Object} the file to delete
// #returns {boolean}
*/
But ideally, it should be transformed to:
/**
* This will remove the given file from the machine attachments list. If the file was previously uploaded
* to the server, it will be marked for deletion on the server. Otherwise, it will be simply truncated from
* the attachments array.
* #param file {Object} the file to delete
* #returns {boolean}
*/
Here's an example of a complete file to transform: https://gist.github.com/sylvainbx/96e53b879b4dd7ef7cdd153c3fc3c5b8
Any help would be appreciated :)

You can't easily do this within a single regular expression, as you need to capture and replace the first // # differently from the second // #. The following script shows how you can call a subroutine on the right hand side of the replacement and in that subroutine, you can use three simple regular expressions to convert your comment to JSdoc:
#!perl
use strict;
use warnings;
my $js = <<'JS';
foo
// #
// This will remove the given file from the machine attachments list. If the file was previously uploaded
// to the server, it will be marked for deletion on the server. Otherwise, it will be simply truncated from
// the attachments array.
// #param file {Object} the file to delete
// #returns {boolean}
// #
// # List of trainings
bar
JS
sub uncomment {
my($doc) = #_;
$doc =~ s!^// #!/*!;
$doc =~ s!// #$! */!;
$doc =~ s!^//( #)?! $1 ||'' eq '#' ? '/*' : ' *' !gme;
warn $doc;
$doc
}
$js =~ s!(// #\R// (.+\n)+( *)// #\R)!uncomment($1)!gme;
print $js;
Converted to a stand-alone document-to-JSdoc pipe, the following works:
#!perl
use strict;
use warnings;
local $/;
my $js = <>;
sub uncomment {
my($doc) = #_;
$doc =~ s!^// #!/*!;
$doc =~ s!// #$! */!;
$doc =~ s!^//( #)?! $1 ||'' eq '#' ? '/*' : ' *' !gme;
warn $doc;
$doc
}
$js =~ s!(// #\R// (.+?\n)+( *)// #\R)!uncomment($1)!gme;
print $js;

Related

Perl: should the function TreeBuilder be adapted when it is in a loop foreach?

My code is to enter an actor name and the program, via the given actor's filmography in IMDB, lists on a hash table all the cinematic genres of the movies he has acted in as well as their frequency. However, I have a problem: When I type a name like "brad pitt" or "bruce willis" after running the program at the prompt, execution takes indefinitely. How do you know what the problem is?
Another problem: when I type "nicolas bedos" (an actor name that I entered from the beginning), it works but it seems that the index is only made for a single movie selected in the #url_links list. Should the look_down function of the TreeBuilder module within a foreach loop be adapted? I was telling myself that the #genres list was overwritten on each iteration so I added a push () but the result remains the same.
use LWP::Simple;
use PerlIO::locale;
use HTML::TreeBuilder;
use WWW::Mechanize;
binmode STDOUT, ':locale';
use strict;
use warnings;
print "Enter the actor's name:";
my $acteur1 = <STDIN>; # the user enters the name of the actor
print "We will analyze the filmography of the actor $actor1 by genre\n";
#we put the link with the given actor in Mechanize variable in order to browse the internet links
my $lien1 = "https://www.imdb.com/find?s=nm&q=$acteur1";
my $mech = WWW::Mechanize->new();
$mech->get($lien1); #we access the search page with the get function
$mech->follow_link( url_regex => qr/nm0/i ); #we access the first result using the follow_link function and the regular expression nm0 which is in the URL
my #url_links= $mech->find_all_links( url_regex => qr/title\/tt/i ); #owe insert in an array all the links having as regular expression "title" in their URL
my $nb_links = #url_links; #we record the number of links in the list in this variable
my $tree = HTML::TreeBuilder->new(); #we create the TreeBuilder module to access a specific text on the page via the tags
my %index; #we create a hashing table
my #genres = (); #we create the genre list to insert all the genres encountered
foreach (#url_links) { #we make a loop to browse all the saved links
my $mech2 = WWW::Mechanize->new();
my $html = $_->url(); #we take the url of the link
if ($html =~ m=^/title=) { #if the url starts with "/title"
$mech2 ->get("https://www.imdb.com$html"); #we complete the link
my $content = $mech2->content; #we take the content of the page
$tree->parse($content); #we access the url and we use the tree to find the strings that interest us
#genres = $tree->look_down ('class', 'see-more inline canwrap', #We have as criterion to access the class = "see-more .."
sub {
my $link = $_[0]->look_down('_tag','a'); #new conditions: <a> tags
$link->attr('href') =~ m{genres=}; #autres conditions: "genres" must be in the URL
}
);
}
}
my #genres1 = (); #we create a new list to insert the words found (the genres of films)
foreach my $e (#genres){ #we create a loop to browse the list
my $genre = $e->as_text; #the text of the list element is inserted into the variable
#genres1 = split(/[à| ]/,$genre); #we remove the unnecessary characters that are spaces, at and | which allow to keep that the terms of genre cine
}
foreach my $e (#genres1){ #another loop to filter listing errors (Genres: etc ..) and add the correct words to the hash table
if ($e ne ("Genres:" or "") ) {
$index{$e}++;
}
}
$tree->delete; #we delete the tree as we no longer need it
foreach my $cle (sort{$index{$b} <=> $index{$a}} keys %index){
print "$cle : $index{$cle}\n"; #we display the hash table with the genres and the number of times that appear in the filmography of the given actor
}
Thank you in advance for your help,
wobot
 
The IMDB Conditions of Use say this:
Robots and Screen Scraping: You may not use data mining, robots, screen scraping, or similar data gathering and extraction tools on this site, except with our express written consent as noted below.
So you might want to reconsider what you're doing. Perhaps you could look at the OMDB API instead.

Perl Irssi scripting: rename invalid DCC file

I'm on Windows, using Irssi client irssi-win32-0.8.12.exe.
I'm having problems receiving a file with invalid name:
..nameo_\u2605_name.. (err: DCC can't create file)
How can I strip this invalid part "\u2605" from filename, using script?
This page doesn't help
I think this part of the Irssi source has something to do with it. Starting at line 195
/* if some plugin wants to change the file name/path here.. */
signal_emit("dcc get receive", 1, dcc);
I sure hope Irssi on Windows accepts scripts written in Perl. If this is the case, here's the solution:
use strict;
use warnings;
our $VERSION = "1.0";
our %IRSSI = ();
# interception made by registering signal as first + Irssi::signal_continue()
sub event_ctcp_dccsend {
my ($server, $args, $nick, $addr, $target) = #_;
# split incomming send request args into filename (either before first space or
# quoted), and the rest (IP, port, +optionally filesize)
my ($filename, $rest) = $args =~ /((?:".*")|\S*)\s+(.*)/;
# remember file name for informing sake
my $oldname = $filename;
# replace backslashes with "BSL" (change to anything you wish)
if ($filename =~ s/\\/BSL/g) {
# some info for user
Irssi::print('DCC SEND request from '.$nick.': renamed bad filename '.$oldname.' to '.$filename);
$args = $filename." ".$rest;
# propagate signal; Irssi will proceed the request with altered arguments ($args)
Irssi::signal_continue($server, $args, $nick, $addr, $target);
}
}
# register signal of incoming ctcp 'DCC SEND', before anything else
Irssi::signal_add_first('ctcp msg dcc send', 'event_ctcp_dccsend');
The script intercepts "DCC SEND" ctcp messages and replaces all backslashes in the filename into "BSL" string, then forwards altered arguments of message to any other scripts and Irssi.
If you want to remove all "\uXXXX" instead, use s/\\u\w{4}//g in place of s/\\/BSL/g
I hope it helps!

First 8 bytes are always wrong when downloading a file from my script

I have a Mojolicious Lite script that "gives out" an executable file (user can download the file from the script's URL). I keep encoded data in an inline template in DATA section, then encode it and render_data.
get '/download' => sub {
my $self = shift;
my $hex_data = $self->render_partial( 'TestEXE' );
my $bin_data;
while( $hex_data =~ /([^\n]+)\n?/g ) {
$bin_data .= pack "H".(length $1), $1;
}
my $headers = Mojo::Headers->new;
$headers->add( 'Content-Type', 'application/x-download;name=Test.exe' );
$headers->add( 'Content-Disposition', 'attachment;filename=Test.exe' );
$headers->add( 'Content-Description', 'File Transfer');
$self->res->content->headers($headers);
$self->render_data( $bin_data );
};
__DATA__
## TestEXE.html.ep
4d5a90000300000004000000ffff0000b8000000000000004000000000000000
00000000000000000000000000000000000000000000000000000000b0000000
0e1fba0e00b409cd21b8014ccd21546836362070726f6772616d2063616e6e6f
....
When I run this locally (via built in webserver on http://127.0.0.1:3000/, Win7) I get the correct file (size and contents). But when I run it in CGI mode on shared hosting (Linux), it comes back with correct size, but first 8 bytes of the file are always incorrect (and always different). The rest of the file is correct.
If in my sub i specify $hex_data instead of $bin_data I get what suppose to be there.
I'm at lost.
render_partial isn't what you want.
First, re-encode the executable in base64 format, and specify that the template is base64 encoded (This is assuming hex is not a requirement for your app):
## template-name (base64)
Also, you don't actually need a controller method at all. Mojolicious will handle the process for you - all you have to do is appropriately name the template.
use Mojolicious::Lite;
app->start;
__DATA__
## Test.exe (base64)
...
http://127.0.0.1:3000/Test.exe will then download the file.
-
If you still want to use a controller method for app-specific concerns, get the data template specifically:
use Mojolicious::Lite;
get '/download' => sub {
my $self = shift;
# http://mojolicio.us/perldoc/Mojolicious/Renderer.pm#get_data_template
my $data = $self->app->renderer->get_data_template({}, 'Test.exe');
# Replace content-disposition instead of adding it,
# to prevent duplication from elsewhere in the app
$self->res->headers->header(
'Content-Disposition', 'attachment;filename=name.exe');
$self->render_data($data);
};
app->start;
__DATA__
## Test.exe (base64)
...
http://127.0.0.1:3000/download will get the template, set the header, and then download it as name.exe.

Perl OpenOffice::OODoc - accessing header/footer elements

How do you get elements in a header/footer of a odt doc?
for example I have:
use OpenOffice::OODoc;
my $doc = odfDocument(file => 'whatever.odt');
my $t=0;
while (my $table = $doc->getTable($t))
{
print "Table $t exists\n";
$t++;
}
When I check the tables they are all from the body. I can't seem to find elements for anything in the header or footer?
I found sample code here which led me to the answer:
#! /usr/local/bin/perl
use OpenOffice::OODoc;
my $file='asdf.odt';
# odfContainer is a representation of the zipped odf file
# and all of its parts.
my $container = odfContainer("$file");
# We're going to look at the 'style' part of the container,
# because that's where the header is located.
my $style = odfDocument
(
container => $container,
part => 'styles'
);
# masterPageHeader takes the style name as its argument.
# This is not at all clear from the documentation.
my $masterPageHeader = $style->masterPageHeader('Standard');
my $headerText = $style->getText( $masterPageHeader );
print "$headerText\n"
The master page style defines the look and feel of the document -- think CSS. Apparently 'Standard' is the default name for the master page style of a document created by OpenOffice... that was the toughest nut to crack... once I found the example code, that fell out in my lap.

What's the simplest way to make a HTTP GET request in Perl?

I have some code I've written in PHP for consuming our simple webservice, which I'd also like to provide in Perl for users who may prefer that language. What's the simplest method of making a HTTP request to do that? In PHP I can do it in one line with file_get_contents().
Here's the entire code I want to port to Perl:
/**
* Makes a remote call to the our API, and returns the response
* #param cmd {string} - command string ID
* #param argsArray {array} - associative array of argument names and argument values
* #return {array} - array of responses
*/
function callAPI( $cmd, $argsArray=array() )
{
$apikey="MY_API_KEY";
$secret="MY_SECRET";
$apiurl="https://foobar.com/api";
// timestamp this API was submitted (for security reasons)
$epoch_time=time();
//--- assemble argument array into string
$query = "cmd=" .$cmd;
foreach ($argsArray as $argName => $argValue) {
$query .= "&" . $argName . "=" . urlencode($argValue);
}
$query .= "&key=". $apikey . "&time=" . $epoch_time;
//--- make md5 hash of the query + secret string
$md5 = md5($query . $secret);
$url = $apiurl . "?" . $query . "&md5=" . $md5;
//--- make simple HTTP GET request, put the server response into $response
$response = file_get_contents($url);
//--- convert "|" (pipe) delimited string to array
$responseArray = explode("|", $response);
return $responseArray;
}
LWP::Simple:
use LWP::Simple;
$contents = get("http://YOUR_URL_HERE");
LWP::Simple has the function you're looking for.
use LWP::Simple;
$content = get($url);
die "Can't GET $url" if (! defined $content);
Take a look at LWP::Simple.
For more involved queries, there's even a book about it.
I would use the LWP::Simple module.
Mojo::UserAgent is a great option too!
use Mojo::UserAgent;
my $ua = Mojo::UserAgent->new;
# Say hello to the Unicode snowman with "Do Not Track" header
say $ua->get('www.☃.net?hello=there' => {DNT => 1})->res->body;
# Form POST with exception handling
my $tx = $ua->post('https://metacpan.org/search' => form => {q => 'mojo'});
if (my $res = $tx->success) { say $res->body }
else {
my ($err, $code) = $tx->error;
say $code ? "$code response: $err" : "Connection error: $err";
}
# Quick JSON API request with Basic authentication
say $ua->get('https://sri:s3cret#example.com/search.json?q=perl')
->res->json('/results/0/title');
# Extract data from HTML and XML resources
say $ua->get('www.perl.org')->res->dom->html->head->title->text;`
Samples direct from CPAN page. I used this when I couldn 't get LWP::Simple to work on my machine.
Try the HTTP::Request module.
Instances of this class are usually passed to the request() method of an LWP::UserAgent object.
If it's in Unix and if LWP::Simple isn't installed, you can try:
my $content = `GET "http://trackMyPhones.com/"`;
I think what Srihari might be referencing is Wget, but I would actually recommend (again, on *nix without LWP::Simple) to use cURL:
$ my $content = `curl -s "http://google.com"`;
<HTML><HEAD><meta http-equiv="content-type" content="text/html;charset=utf-8">
<TITLE>301 Moved</TITLE></HEAD><BODY>
<H1>301 Moved</H1>
The document has moved
here.
</BODY></HTML>
The -s flag tells curl to be silent. Otherwise, you get curl's progress bar output on standard error every time.