Use perl WWW::Mechanize on a local file - perl

I'm currently working on a Perl script and I use the CPAN module WWW:Mechanize to get HTML pages from websites.
However, I would like to be able to work on offline HTML files as well (that I would save myself beforehand most likely) so I don't need the internet each time I'm trying a new script.
So basically my question is how can I transform thisĀ :
$mech->get( 'http://www.websiteadress.html' );
into thisĀ :
$mech->get( 'C:\User\myfile.html' );
I've seen that file:// could be useful but I obviously don't know how to use it as I get errors every time.

The get() method from WWW::Mechanize takes a URL as its argument. So you just need to work out what the correct URL is for your local file. You're on the right lines with the "file://" scheme.
I think you will need:
$mech->get( 'file:///C:/User/myfile.html' );
Note two important things that people often get wrong.
URLs only understand forward slashes (/), so you need to convert Windows' warped backslash (\) monstrosities. Update: As Borodin points out in a comment, this isn't true - you can use backslashes in URLs. However, backslashes often have special meanings in Perl strings, so I'd advise using forward slashes whenever possible.
The scheme is file, which is followed by :// (with two slashes), then the hostname (which is an empty string) a slash (/) and then your local path (C:/). So that means that there are three slashes after file:. That seems wrong, so people often omit one of them. Update: description made more accurate following advice from Borodin in a comment.
Wikipedia (as always) has a lot more information - file URI scheme

Related

How to make podlinkcheck not complain about URLs with a fragment

I have a Perl module with a L<...> link like this:
=head1 ...
See L<RFC 8250|https://datatracker.ietf.org/doc/html/rfc8259#section-4>.
=cut
1;
When I run podlinkcheck (version 15) on it, it complains:
themodule.pm:3:5: no module/program/pod "https:"
even though perldoc perlpod says:
Or you can link to a web page:
Lscheme:...
L<text|scheme:...>
Links to an absolute URL. For example, Lhttp://www.perl.org/ or L<The Perl Home Page|http://www.perl.org/>.
I want to keep using podlinkcheck for all my actual Perl module links, but how do I tell it to treat links that start with https:// as hyperlinks instead of looking for a Perl module by that name?
(It seems to work when I remove the fragment (i.e., See L<RFC 8250|https://datatracker.ietf.org/doc/html/rfc8259> but especially for long documents I want to link to a particular section, not just the whole thing. Escaping the # with a backslash and putting double quotes around it did not help.)

Effect of use Encode qw/encode decode from_to/;?

What is the effect of this at the top of a perl script?
use Encode qw/encode decode from_to/;
I found this on code I have taken over, but I don't know what it does.
Short story: for an experienced perl coded who knows what modules are:
The Encode module is for converting perl strings to "some other" format (for which there are many sub-modules that define difference formats). Typically, it's used for converting to and from Unicode formats eg:
... to convert a string from Perl's internal format into ISO-8859-1, also known as Latin1:
$octets = encode("iso-8859-1", $string);
decode is for going the other way, and from_to converts a string from one format to another in place;
from_to($octets, "iso-8859-1", "cp1250");
Long story: for someone who doesn't know what a module is/does:
This is the classic way one uses code from elsewhere. "Elsewhere" usually means one of two possibilities - either;
Code written "in-house" - ie: a part of your private application that a past developer has decided to factor out (presumably) because its applicable in several locations/applications; or
Code written outside the organisation and made available publicly, typically from the Comprehensive Perl Archive Network - CPAN
Now, it's possible - but unlikely - that someone within your organization has created in-house code and co-incidentally used the same name for a module on CPAN so, if you check CPAN by searching for "Encode" - you can see that there is a module of that name - and that will almost certainly be what you are using. You can read about it here.
The qw/.../ stands for "quote words" and is a simple short hand for creating a list of strings; in this case it translates to ("encode", "decode", "from_to") which in turn is a specification of what parts of the Encode module you (or the original author) want.
You can read about those parts under the heading "Basic methods" on the documentation (or "POD") page I referred earlier. Don't be put off by the reference to "methods" - many modules (and it appears this one) are written in such a way that they support both an Object Oriented and functional interface. As a result, you will probably see direct calls to the three functions mentioned earlier as if they were written directly in the program itself.

How to find literals in source code of Smartforms and in SAPScripts (or reports, if the others can't be done)

I'd like to check hardcoded values in (a lot of) Smartforms and SAPScript forms.
I have found a way to read the source code of both of these, but it seems that i will have to go through a lot of parsing before I get anything reliable.
I've come across function module GET_LITERAL but that doesn't seem to help me much since i have to specify the offset of the value, if i got right what the function is doing in the first place.
I also found RS_LITERAL_LIST but that also doesn't do what i expect.
I also tried searching for reports and methods, but haven't found anything that seemed to help.
A backup plan would be to get some good parsing tool, so do you know of anything like that.
Anyway, any hints would be helpful and appreciated.
[EDIT]
Forgot to mention, the version of my system is 4.6C
If you have a fairly recent version of ABAP, you can use a regex.
Follow the pattern of this example, but use your source as the text and create your own regex. Have it look for any single quotes on the end of a word separated by spaces or any integers with spaces on either side. That's just a start, you might need to work on a better pattern.
String functions count, find, and match

Why is the apostrophe sign a valid path separator in Perl

In Perl, you call modules using :: as path separator. So, if you have a module on site/lib/GD/Image.pm, you call use GD::Image.
However, long time ago I found out that you can also call use GD'Image and things like my $img = new GD'Image;, and there are also modules on CPAN using that syntax on ther names/documentation.
What is the purpose or logic behind that? Is it maybe, as many things in Perl, just a feature intended to humanize sentences and allow you to create and use modules like Acme::Don't?
Does it have any other intention different to ::?
See perlmod for explanation:
The old package delimiter was a single quote, but double colon is now the preferred delimiter
So, the reason is history.
The single quote is an old ADA separator. However, it didn't play well with Emacs, so the double colon became used.
Good God! ADA? Emacs? I am old.

Is File::Spec really necessary?

I know all about the history of different OSes having different path formats, but at this point in time there seems to be a general agreement (with one sorta irrelevant holdout*) about how paths work. I find the whole File::Spec route of path management to be clunky and a useless pain.
Is it really worth having this baroque set of functions to manipulate paths? Please convince me I am being shortsighted.
* Irrelevant because even MS Windows allows forward slashes in paths, which means the only funky thing is the volume at the start and that has never really been a problem for me.
Two major systems have volumes. What's the parent of C:? In unix, it's C:/... In Windows, it's C:... (Unfortunately, most people misuse File::Spec to the point of breaking this.)
There are three different set of path separators in the major systems. The fact that Windows supports "/" could simplify building paths, but it doesn't help in parsing them or to canonising them.
File::Spec also provides useful functions that make it useful even if every system did use the same style of paths, such as the one that turns a path into a relative path.
That said, I never use File::Spec. I use Path::Class instead. Without sacrificing any usability or usefulness, Path::Class provides a much better interface. And it doesn't let users mishandle volumes.
For usual file management inside Perl, No, File::Spec is not necessary and using forward slahes everywhere makes much less pain and works on Win32 anyways.
cpanminus is a good example used by lots of people and have been proved work great on win32 platform. it doesn't use File::Spec for most file path manipulation and just uses forward slashes - that was even suggested so by the experienced Perl-Win32 developers.
The only place I had to use File::Spec's catfile in cpanm, though, is where I extract file paths from a perl error message (Can't locate File\Path.pm blah blah) and create a file path to pass to the command line (i.e. cmd.exe).
Meanwhile File::Spec provides useful functions such as canonical and rel2abs - that's not "necessary" per se but really useful.
Yes absolutely.
Golden rule of programming, never hard code string literals.
Edit: One of the best ways to avoid porting issues is to avoid OS specific constants especially in the form of inline literals.
i.e e.g drive + ":/" + path + "/" + filename
It is bad practice yet We all commit these attrocities in the haste of the moment or because it doesn't matter for that piece of code. File::Spec is there for when a programmer is adhering to gospel programming.
In addition it provides the values of special and often used system directories e.g tmp or devnull which can vary from one distribution/OS to another.
If anything it could probably do with some other members added to it like user to point to the users home directory
makepp (makepp.sourceforge.net) has a makefile variable $/ which is either / or \ (on non-Cygwin Win). The reason is that Win accepts / in filenames, but not in command names (where it starts an option).
From http://perldoc.perl.org/File/Spec.html:
catdir
Concatenate two or more directory names to form a complete path ending with a directory. But remove the trailing slash from the resulting string, because it doesn't look good, isn't necessary and confuses OS/2. Of course, if this is the root directory, don't cut off the trailing slash :-)
So for example in this example I wouldn't need the regex to remove the trailing slash if I would use catdir.