Getting UTF-8 Request Parameter Strings in mod_perl2 - perl

I'm using mod_perl2 for a website and use CGI::Apache2::Wrapper to get the request parameters for the page (e.g. post data). I've noticed that the string the $req->param("parameter") function returns is not UTF-8. If I use the string as-is I can end up with garbled results, so I need to decode it using Encode::decode_utf8(). Is there anyway to either get the parameters already decoded into UTF-8 strings or loop through the parameters and safely decode them?

To get the parameters already decoded, we would need to override the behaviour of the underlying class Apache2::Request from libapreq2, thus losing its XS speed advantage. But that is not even straightforward possible, as unfortunately we are sabotaged by the CGI::Apache2::Wrapper constructor:
unless (defined $r and ref($r) and ref($r) eq 'Apache2::RequestRec') {
This is wrong OO programming, it should say
… $r->isa('Apache2::RequestRec')
or perhaps forego class names altogether and just test for behaviour (… $r->can('param')).
I say, with those obstacles, it's not worth it. I recommend to keep your existing solution that decodes parameters explicitly. It's clear enough.
To loop over the request parameters, simply do not pass an argument to the param method and you get a list of the names. This is documented (1, 2), please read more carefully.

Related

What's the correct way to convert from StringBuilder to String?

From what I've seen online, people seem to suggest that the toString() method is to be used, however the documentation states:
Creates a String representation of this object. The default representation is platform dependent. On the java platform it is the concatenation of the class name, "#", and the object's hashcode in hexadecimal.
So it seems like using this method might cause some problems down the line?
There is also mkString and result(). The latter of which seems to make the most sense. But I'm not sure what the differences between these 3 methods are and if that's how result() is supposed to be used.
The toString implementation currently just redirects to the result method anyway, so those two methods will behave in the same way. However, they express slightly different intent:
toString requests a textual representation of StringBuilders current state that is "concise but informative (and) that is easy for a person to read". So, theoretically, the (vague) specification of this method does not forbid abbreviating the result, or enhancing conciseness and readability in any other way.
result requests the actual constructed string. No different readings seem possible here.
Therefore, if you want to obtain the resulting string, use result to express your intent as clearly as possible.
In this way, the reader of your code won't have to wonder whether StringBuilder.toString might shorten something for the sake of "conciseness" when the string gets over 9000 kB long, or something like that.
The mkString is for something else entirely, it's mostly used for interspersing separators, as in "hello".mkString(",") == "h,e,l,l,o".
Some further links:
The paragraph with "hashcode in hexadecimal" describes the default. It is just documentation inherited from AnyRef, because the creator of StringBuilder didn't bother to provide more detailed documentation.
If you look into code, you'll see that toString is actually just delegating to result.
The documentation of StringBuilder also mentions result() in the introductory overview paragraph.
Just use result().
TL;DR; use result as stated in the docs.
toString MUST never be called in anything at all for another purpose other than a quick debug.
mkString is inherited from collections hierarchy and it will basically create another StringBuilder so is very inefficient.

Tell IPython to use an object's `__str__` instead of `__repr__` for output

By default, when IPython displays an object, it seems to use __repr__.
__repr__ is supposed to produce a unique string which could be used to reconstruct an object, given the right environment.
This is distinct from __str__, which supposed to produce human-readable output.
Now suppose we've written a particular class and we'd like IPython to produce human readable output by default (i.e. without explicitly calling print or __str__).
We don't want to fudge it by making our class's __repr__ do __str__'s job.
That would be breaking the rules.
Is there a way to tell IPython to invoke __str__ by default for a particular class?
This is certainly possible; you just need implement the instance method _repr_pretty_(self). This is described in the documentation for IPython.lib.pretty. Its implementation could look something like this:
class MyObject:
def _repr_pretty_(self, p, cycle):
p.text(str(self) if not cycle else '...')
The p parameter is an instance of IPython.lib.pretty.PrettyPrinter, whose methods you should use to output the text representation of the object you're formatting. Usually you will use p.text(text) which just adds the given text verbatim to the formatted representation, but you can do things like starting and ending groups if your class represents a collection.
The cycle parameter is a boolean that indicates whether a reference cycle is detected - that is, whether you're trying to format the object twice in the same call stack (which leads to an infinite loop). It may or may not be necessary to consider it depending on what kind of object you're using, but it doesn't hurt.
As a bonus, if you want to do this for a class whose code you don't have access to (or, more accurately, don't want to) modify, or if you just want to make a temporary change for testing, you can use the IPython display formatter's for_type method, as shown in this example of customizing int display. In your case, you would use
get_ipython().display_formatter.formatters['text/plain'].for_type(
MyObject,
lambda obj, p, cycle: p.text(str(obj) if not cycle else '...')
)
with MyObject of course representing the type you want to customize the printing of. Note that the lambda function carries the same signature as _repr_pretty_, and works the same way.

Parameterized logging in slf4j - how does it compare to scala's by-name parameters?

Here are two statements that seem to be generally accepted, but that I can't really get over:
1) Scala's by-name params gracefully replace the ever-so-annoying log4j usage pattern:
if (l.isDebugEnabled() ) {
logger.debug("expensive string representation, eg: godObject.toString()")
}
because the by-name-parameter (a Scala-specific language feature) doesn't get evaluated before the method invocation.
2) However, this problem is solved by parametrized logging in slf4f:
logger.debug("expensive string representation, eg {}:", godObject[.toString()]);
So, how does this work?
Is there some low-level magic involved in the slf4j library that prevents the evaluation of the parameter before the "debug" method execution? (is that even possible? Can a library impact such a fundamental aspect of the language?)
Or is it just the simple fact that an object is passed to the method - rather than a String? (and maybe the toString() of that object is invoked in the debug( ) method itself, if applicable).
But then, isn't that true for log4j as well? (it does have methods with Object params).
And wouldn't this mean that if you pass a string - as in the code above - it would behave identically to log4j?
I'd really love to have some light shed on this matter.
Thanks!
There is no magic in slf4j. The problem with logging used to be that if you wanted to log let's say
logger.debug("expensive string representation: " + godObject)
then no matter if the debug level was enabled in the logger or not, you always evaluated godObject.toString()which can be an expensive operation, and then also string concatenation. This comes simply from the fact that in Java (and most languages) arguments are evaluated before they're passed to a function.
That's why slf4j introduced logger.debug(String msg, Object arg) (and other variants for more arguments). The whole idea is that you pass cheap arguments to the debug function and it calls toString on them and combines them into a message only if the debug level is on.
Note that by calling
logger.debug("expensive string representation, eg: {}", godObject.toString());
you drastically reduce this advantage, as this way you convert godObject all the time, before you pass it to debug, no matter what debug level is on. You should use only
logger.debug("expensive string representation, eg: {}", godObject);
However, this still isn't ideal. It only spares calling toString and string concatenation. But if your logging message requires some other expensive processing to create the message, it won't help. Like if you need to call some expensiveMethod to create the message:
logger.debug("expensive method, eg: {}",
godObject.expensiveMethod());
then expensiveMethod is always evaluated before being passed to logger. To make this work efficiently with slf4j, you still have to resort back to
if (logger.isDebugEnabled())
logger.debug("expensive method, eg: {}",
godObject.expensiveMethod());
Scala's call-by-name helps a lot in this matter, because it allows you to wrap arbitrary piece of code into a function object and evaluate that code only when needed. This is exactly what we need. Let's have a look at slf4s, for example. This library exposes methods like
def debug(msg: => String) { ... }
Why no arguments like in slf4j's Logger? Because we don't need them any more. We can write just
logger.debug("expensive representation, eg: " +
godObject.expensiveMethod())
We don't pass a message and its arguments, we pass directly a piece of code that is evaluated to the message. But only if the logger decides to do so. If the debug level isn't on, nothing that's within logger.debug(...) is ever evaluated, the whole thing is just skipped. Neither expensiveMethod is called nor any toString calls or string concatenation happen. So this approach is most general and most flexible. You can pass any expression that evaluates to a String to debug, no matter how complex it is.

javascript arguments are messed up when passed to NPAPI plugin function

I am using a simple NPAPI example from https://github.com/mikma/npsimple.
When I try to pass arguments from javascript to the NPAPI invoke function, the
parameters recieved by the NPAPI function are garbage, though argument count is
passed correctly. The following is the definition of the function in which I am trying to print the "args" array after converting them to char*:
invoke(NPObject* obj, NPIdentifier methodName, const NPVariant *args, uint32_t argCount, NPVariant *result)
Am I missing something here?
It is really hard to tell what you're trying to do based on what you have given us. Specifically, as smorgan requested, we need to know how you are trying to convert the args array to char*.
You are aware of how the NPVariant works? If it's a string, the NPVariant type will be NPVariantType_String and you will need to use both the UTF8Characters member of the NPString struct (which in turn is part of the NPVariant union) and the UTF8Length member, since the string may or may not be null terminated.
Also, keep in mind that depending on what you put in, it may or may not be valid to make your NPVariant a char*. If that helps, great; if it doesn't, please post the contents of the function in which you are trying to handle the input as well as the specific javascript calls that you are making. You haven't given us enough to work with to give you more than guesses as to what problem you may be having.

What is the difference between new Some::Class and Some::Class->new() in Perl?

Many years ago I remember a fellow programmer counselling this:
new Some::Class; # bad! (but why?)
Some::Class->new(); # good!
Sadly now I cannot remember the/his reason why. :( Both forms will work correctly even if the constructor does not actually exist in the Some::Class module but instead is inherited from a parent somewhere.
Neither of these forms are the same as Some::Class::new(), which will not pass the name of the class as the first parameter to the constructor -- so this form is always incorrect.
Even if the two forms are equivalent, I find Some::Class->new() to be much more clear, as it follows the standard convention for calling a method on a module, and in perl, the 'new' method is not special - a constructor could be called anything, and new() could do anything (although of course we generally expect it to be a constructor).
Using new Some::Class is called "indirect" method invocation, and it's bad because it introduces some ambiguity into the syntax.
One reason it can fail is if you have an array or hash of objects. You might expect
dosomethingwith $hashref->{obj}
to be equal to
$hashref->{obj}->dosomethingwith();
but it actually parses as:
$hashref->dosomethingwith->{obj}
which probably isn't what you wanted.
Another problem is if there happens to be a function in your package with the same name as a method you're trying to call. For example, what if some module that you use'd exported a function called dosomethingwith? In that case, dosomethingwith $object is ambiguous, and can result in puzzling bugs.
Using the -> syntax exclusively eliminates these problems, because the method and what you want the method to operate upon are always clear to the compiler.
See Indirect Object Syntax in the perlobj documentation for an explanation of its pitfalls. freido's answer covers one of them (although I tend to avoid that with explicit parens around my function calls).
Larry once joked that it was there to make the C++ feel happy about new, and although people will tell you not to ever use it, you're probably doing it all the time. Consider this:
print FH "Some message";
Have you ever wondered my there was no comma after the filehandle? And there's no comma after the class name in the indirect object notation? That's what's going on here. You could rewrite that as a method call on print:
FH->print( "Some message" );
You may have experienced some weirdness in print if you do it wrong. Putting a comma after the explicit file handle turns it into an argument:
print FH, "some message"; # GLOB(0xDEADBEEF)some message
Sadly, we have this goofiness in Perl. Not everything that got into the syntax was the best idea, but that's what happens when you pull from so many sources for inspiration. Some of the ideas have to be the bad ones.
The indirect object syntax is frowned upon, for good reasons, but that's got nothing to do with constructors. You're almost never going to have a new() function in the calling package. Rather, you should use Package->new() for two other (better?) reasons:
As you said, all other class methods take the form Package->method(), so consistency is a Good Thing
If you're supplying arguments to the constructor, or you're taking the result of the constructor and immediately calling methods on it (if e.g. you don't care about keeping the object around), it's simpler to say e.g.
$foo = Foo->new(type => 'bar', style => 'baz');
Bar->new->do_stuff;
than
$foo = new Foo(type => 'bar', style => 'baz');
(new Bar)->do_stuff;
Another problem is that new Some::Class happens at run time. If there is an error and you testing never branches to this statement, you never know it until it happens in production. It is better to use Some::Class->new unless you are doing dynamic programing.