Effect of use Encode qw/encode decode from_to/;? - perl

What is the effect of this at the top of a perl script?
use Encode qw/encode decode from_to/;
I found this on code I have taken over, but I don't know what it does.

Short story: for an experienced perl coded who knows what modules are:
The Encode module is for converting perl strings to "some other" format (for which there are many sub-modules that define difference formats). Typically, it's used for converting to and from Unicode formats eg:
... to convert a string from Perl's internal format into ISO-8859-1, also known as Latin1:
$octets = encode("iso-8859-1", $string);
decode is for going the other way, and from_to converts a string from one format to another in place;
from_to($octets, "iso-8859-1", "cp1250");
Long story: for someone who doesn't know what a module is/does:
This is the classic way one uses code from elsewhere. "Elsewhere" usually means one of two possibilities - either;
Code written "in-house" - ie: a part of your private application that a past developer has decided to factor out (presumably) because its applicable in several locations/applications; or
Code written outside the organisation and made available publicly, typically from the Comprehensive Perl Archive Network - CPAN
Now, it's possible - but unlikely - that someone within your organization has created in-house code and co-incidentally used the same name for a module on CPAN so, if you check CPAN by searching for "Encode" - you can see that there is a module of that name - and that will almost certainly be what you are using. You can read about it here.
The qw/.../ stands for "quote words" and is a simple short hand for creating a list of strings; in this case it translates to ("encode", "decode", "from_to") which in turn is a specification of what parts of the Encode module you (or the original author) want.
You can read about those parts under the heading "Basic methods" on the documentation (or "POD") page I referred earlier. Don't be put off by the reference to "methods" - many modules (and it appears this one) are written in such a way that they support both an Object Oriented and functional interface. As a result, you will probably see direct calls to the three functions mentioned earlier as if they were written directly in the program itself.

Related

Using Perl, is there a difference between Win32API::File::MoveFile and CORE::rename on MSWin32?

I see that Win32API::File supports MoveFile(). However, I'm not sure how CORE::rename() is implemented in such a fashion that it should matter. Could someone juxtapose the difference -- specifically for the Win32 Environment -- between
CORE::rename()
File::Copy::move()
and, Win32API::File::MoveFile()
rename is implemented in a broken fashion since forever; move too, since it uses rename.
Win32::Unicode::File exposes MoveFileW from windows.h as moveW, and apparently handles encoding in a sane fashion, whereas Win32API::File leaves that to the user AFAICS from existing example code.
Related: How do I copy a file with a UTF-8 filename to another UTF-8 filename in Perl on Windows?

An automated way to do string extraction for Perl/Mason i18n?

I'm currently working on internationalizing a very large Perl/Mason web application, as a team of one (does that make this a death march??). The application is nearing 20 years old, and is written in a relatively old-school Perl style; it doesn't use Moose or another OO module. I'm currently planning on using Locale::Maketext::Gettext to do message lookups, and using GNU Gettext catalog files.
I've been trying to develop some tools to aid in string extraction from our bigass codebase. Currently, all I have is a relatively simple Perl script to parse through source looking for string literals, prompt the user with some context and whether or not the string should be marked for translation, and mark it if so.
There's way too much noise in terms of strings I need to mark versus strings I can ignore. A lot of strings in the source aren't user-facing, such as hash keys, or type comparisons like
if (ref($db_obj) eq 'A::Type::Of::Db::Module')
I do apply some heuristics to each proposed string to see whether I can ignore it off the bat (ex. I ignore strings that are used for hash lookups, since 99% of the time in our codebase these aren't user facing). However, despite all that, around 90% of the strings my program shows me are ones I don't care about.
Is there a better way I could help automate my task of string extraction (i.e. something more intelligent than grabbing every string literal from the source)? Are there any commercial programs that do this that could handle both Perl and Mason source?
ALSO, I had a (rather silly) idea for a superior tool, whose workflow I put below. Would it be worth the effort implementing something like this (which would probably take care of 80% of the work very quickly), or should I just submit to an arduous, annoying, manual string extraction process?
Start by extracting EVERY string literal from the source, and putting it into a Gettext PO file.
Then, write a Mason plugin to parse the HTML for each page being served by the application, with the goal of noting strings that the user is seeing.
Use the hell out of the application and try to cover all use cases, building up a store of user facing strings.
Given this store of strings the user saw, do fuzzy matches against strings in the catalog file, and keep track of catalog entries that have a match from the UI.
At the end, anything in the catalog file that didn't get matched would likely not be user facing, so delete those from the catalog.
There are no Perl tools I know of which will intelligently extract strings which might need internationalization vs ones that will not. You're supposed to mark them in the code as you write them, but as you said that wasn't done.
You can use PPI to do the string extraction intelligently.
#!/usr/bin/env perl
use strict;
use warnings;
use Carp;
use PPI;
my $doc = PPI::Document->new(shift);
# See PPI::Node for docs on find
my $strings = $doc->find(sub {
my($top, $element) = #_;
print ref $element, "\n";
# Look for any quoted string or here doc.
# Does not pick up unquoted hash keys.
return $element->isa("PPI::Token::Quote") ||
$element->isa("PPI::Token::HereDoc");
});
# Display the content and location.
for my $string (#$strings) {
my($line, $row, $col) = #{ $string->location };
print "Found string at line $line starting at character $col.\n";
printf "String content: '%s'\n", string_content($string);
}
# *sigh* PPI::Token::HereDoc doesn't have a string method
sub string_content {
my $string = shift;
return $string->isa("PPI::Token::Quote") ? $string->string :
$string->isa("PPI::Token::HereDoc") ? $string->heredoc :
croak "$string is neither a here-doc nor a quote";
}
You can do more sophisticated examination of the tokens surrounding the strings to determine if it's something significant. See PPI::Element and PPI::Node for more details. Or you can examine the content of the string to determine if it's significant.
I can't go much further because "significant" is up to you.
Our Source Code Search Engine is normally used to efficiently search large code bases, using indexes constructed from the lexemes of the languages it knows. That list of languages is pretty broad, including Java, C#, COBOL and ... Perl. The lexeme extractors are language precise (because they are "stolen" from our DMS Software Reengineering Toolkit, a language-agnostic program transformation system, where precision is fundamental).
Given an indexed code base, one can then enter queries to find arbitrary sequences of lexemes in spite of language-specific white space; one can log the hits of such queries and their locations.
The extremely short query:
S
to the Search Engine finds all lexical elements which are classified as strings (keywords, variable names, comments are all ignored; just strings!). (Normally people write more complex queries with regular expression constraints, such as S=*Hello to find strings that end with "Hello")
The relevance here is that the Source Code Search Engine has precise knowledge of lexical syntax of strings in Perl (including specifically elements of interpolated strings and all the wacky escape sequences). So the query above will find all strings in Perl; with logging on, you get all the strings and their locations logged.
This stunt actually works for any langauge the Search Engine understands, so it is a rather general way to extract the strings for such internationalization tasks.

How can ported code be detected?

If you port code over from one language to another, how can this be detected?
Say you were porting code from c++ to Java, how could you tell?
What would be the difference between a program designed and implemented in Java, and a near identical program ported over to Java?
If the porting is done properly (by people expert in both languages and ready to translate the source language's idioms into the best similar idioms of the target language), there's no way you can tell that any porting has taken place.
If the porting is done incompetently, you can sometimes recognize goofily-transliterated idioms... but that can be hard to distinguish from people writing a new program in a language they know little just goofily transliterating the idioms from the language they do know;-).
Depending on how much effort was put into the intention to hide the porting it could be very easy to impossible to detect.
I would use pattern recognition for this task. Think about the "features" which would indicate code-similarities. Extract these feature from each code and compare them.
e.g:
One feature could be similar symbol names. Extract all symbols using ctags or regular expressions, make all lower-case, make uniq sort of both lists and compare them.
Another possible feature:
List of class + number of members e.g:
MyClass1 10
...
List of method + sequence of controll blocks. e.g:
doSth() if, while, if, ix, case
...
Another easy way, is to represent the code as a picture - e.g. load the code as text in Word and set the font size to 1. Human beings are very good on comparing pictures. For another Ideas of code Visualization you may check http://www.se-radio.net/2009/03/episode-130-code-visualization-with-michele-lanza/

Why are Perl source filters bad and when is it OK to use them?

It is "common knowledge" that source filters are bad and should not be used in production code.
When answering a a similar, but more specific question I couldn't find any good references that explain clearly why filters are bad and when they can be safely used. I think now is time to create one.
Why are source filters bad?
When is it OK to use a source filter?
Why source filters are bad:
Nothing but perl can parse Perl. (Source filters are fragile.)
When a source filter breaks pretty much anything can happen. (They can introduce subtle and very hard to find bugs.)
Source filters can break tools that work with source code. (PPI, refactoring, static analysis, etc.)
Source filters are mutually exclusive. (You can't use more than one at a time -- unless you're psychotic).
When they're okay:
You're experimenting.
You're writing throw-away code.
Your name is Damian and you must be allowed to program in latin.
You're programming in Perl 6.
Only perl can parse Perl (see this example):
#result = (dothis $foo, $bar);
# Which of the following is it equivalent to?
#result = (dothis($foo), $bar);
#result = dothis($foo, $bar);
This kind of ambiguity makes it very hard to write source filters that always succeed and do the right thing. When things go wrong, debugging is awkward.
After crashing and burning a few times, I have developed the superstitious approach of never trying to write another source filter.
I do occasionally use Smart::Comments for debugging, though. When I do, I load the module on the command line:
$ perl -MSmart::Comments test.pl
so as to avoid any chance that it might remain enabled in production code.
See also: Perl Cannot Be Parsed: A Formal Proof
I don't like source filters because you can't tell what code is going to do just by reading it. Additionally, things that look like they aren't executable, such as comments, might magically be executable with the filter. You (or more likely your coworkers) could delete what you think isn't important and break things.
Having said that, if you are implementing your own little language that you want to turn into Perl, source filters might be the right tool. However, just don't call it Perl. :)
It's worth mentioning that Devel::Declare keywords (and starting with Perl 5.11.2, pluggable keywords) aren't source filters, and don't run afoul of the "only perl can parse Perl" problem. This is because they're run by the perl parser itself, they take what they need from the input, and then they return control to the very same parser.
For example, when you declare a method in MooseX::Declare like this:
method frob ($bubble, $bobble does coerce) {
... # complicated code
}
The word "method" invokes the method keyword parser, which uses its own grammar to get the method name and parse the method signature (which isn't Perl, but it doesn't need to be -- it just needs to be well-defined). Then it leaves perl to parse the method body as the body of a sub. Anything anywhere in your code that isn't between the word "method" and the end of a method signature doesn't get seen by the method parser at all, so it can't break your code, no matter how tricky you get.
The problem I see is the same problem you encounter with any C/C++ macro more complex than defining a constant: It degrades your ability to understand what the code is doing by looking at it, because you're not looking at the code that actually executes.
In theory, a source filter is no more dangerous than any other module, since you could easily write a module that redefines builtins or other constructs in "unexpected" ways. In practice however, it is quite hard to write a source filter in a way where you can prove that its not going to make a mistake. I tried my hand at writing a source filter that implements the perl6 feed operators in perl5 (Perl6::Feeds on cpan). You can take a look at the regular expressions to see the acrobatics required to simply figure out the boundaries of expression scope. While the filter works, and provides a test bed to experiment with feeds, I wouldn't consider using it in a production environment without many many more hours of testing.
Filter::Simple certainly comes in handy by dealing with 'the gory details of parsing quoted constructs', so I would be wary of any source filter that doesn't start there.
In all, it really depends on the filter you are using, and how broad a scope it tries to match against. If it is something simple like a c macro, then its "probably" ok, but if its something complicated then its a judgement call. I personally can't wait to play around with perl6's macro system. Finally lisp wont have anything on perl :-)
There is a nice example here that shows in what trouble you can get with source filters.
http://shadow.cat/blog/matt-s-trout/show-us-the-whole-code/
They used a module called Switch, which is based on source filters. And because of that, they were unable to find the source of an error message for days.

How can I combine Catalyst and ngettext?

I'm trying to get my head around i18n with Catalyst. As far as I understood the matter, there are two ways to make translations with Perl: Maketext and Gettext. However, I have a requirement to support gettext's .po format so basically I'm going with gettext.
Now, I've found Catalyst::Plugin::I18n and thus Locale::Maketext::Lexicon, which does what I want most of the time. However, it doesn't generate proper pluralization forms, i.e. properly writing msgid_plural and msgstr[x] into the .pot file. This happens probably because Maketext depends on its bracket notation [quant,_1...] and thus has to have the same notation in the translation.
Yet another solution might be using some direct gettext port like Locale::Messages, however this would mean rewriting C::P::I18n.
Does anybody have a proper solution for this problem apart from rewriting several modules? Anything that combines proper gettext with all its features and Catalyst will do.
You will probably get a better answer on the mailing list:
http://lists.scsys.co.uk/cgi-bin/mailman/listinfo/catalyst
I assume you've also read this:
http://www.catalystframework.org/calendar/2006/18