use WWW::Mechanize;
$mech = new WWW::Mechanize( onerror => \&mecherror );
$mech->get("http://stackoverflow.comxxxx");
print $mech->content;
sub mecherror {
$mech->get("http://stackoverflow.com");
}
The output on line 4 is an error string relating to the first failed get and not the content of the get executed in sub mecherror. Why?
The onerror callback of WWW::Mechanize is meant to supply a response to error
onerror => \&func
Reference to a die-compatible function, such as Carp::croak, that is called when there's a fatal error.
It is clearly not intended for recovery or any use of the object that is being constructed.
That said, your call in onerror works, but the object doesn't get to know about it.
use warnings;
use strict;
use feature 'say';
use WWW::Mechanize;
my $mech = new WWW::Mechanize( onerror => \&mecherror );
$mech->get("http://stackoverflow.comxxxx");
say $mech->content;
sub mecherror {
my $response = $mech->get("http://stackoverflow.com");
# say $mech->content;
say "response is " . ref($response);
say $response->decoded_content;
}
This shows that we duly got an HTTP::Response object, and prints out the page. Then we may hope to pass a reference to the callback to connect it to the calling code. However, a mechanism for this is not provided -- this is not supported. We are warned against messing with internals, though.
As for why the object isn't updated, it depends on the callback implementation. From the source code we see that the code reference goes into object's data and is run when needed via the wrapper
sub die {
my $self = shift;
return unless my $handler = $self->{onerror};
return $handler->(#_);
}
A lot of other code is involved when this triggers, while nothing is done here to change the object's state. That is just unsupported and may result in undefined behavior.
Note that the callback here knows what $mech is because it is global, so it has the right object.
To summarize discussions in comments, it is plausible that the page retrieved by the callback gets overwritten by the error message. We get to see this when invoking content, and it appears to be due to this part of the method (see source)
$content = $self->response()->decoded_content(charset => 'none');
The decoded_content method is from HTTP::Response, inherited via LWP::UserAgent, and the error message indeed seems to have come from that class. (Neither W::M nor LWP::UA have a method named "decode_content.") This is summarized in W::M::content page
$mech->content(...)
Returns the content that the mech uses internally for the last page fetched. Ordinarily this is the same as $mech->response()->decoded_content(), [ ... ]
However, we anyway couldn't rely on the object being in a consistent state, as discussed.
Related
I was trying to send a mail with response from a URL using MIME::Lite::HTML. But it shows
Can't call method "replace" on an undefined value at
/usr/local/share/perl/5.18.2/MIME/Lite/HTML.pm
The code is like this
#!/usr/bin/perl
use MIME::Lite;
use MIME::Lite::HTML;
use strict;
my $mailed_html = new MIME::Lite::HTML(To => "user\#sample.com",
From => "admin\#sample.com",
Subject => "Thank you mail"); # creates MIME::Lite::HTML object
my $msg = $mailed_html->parse("http://sample.com/thankyou.html?id=19&mode=test");
$msg->send;
The variable $mailed_html have MIME::Lite::HTML object, confirmed by printing using Data::Dumper.
Got the answer. Actually there was no error with the code. The request for parsing message using MIME::Lite::HTML was not reaching the server because of DNS resolve issue. If the request can't reach then it will return the object with no data and so replace function cannot be executed for empty data.
Thanks for your responses.
I've got an object that stores an LWP::UserAgent. I want to use different cookie jars for different calls with that UA, so I decided to make the cookie_jar local when doing a call.
The following code shows what I did without debug stuff (for reading, not running). Below is another version with lots of debugging output.
package Foo;
use strictures;
use Moo;
use LWP::UserAgent;
has ua => (
is => 'ro',
default => sub { my $ua = LWP::UserAgent->new; $ua->cookie_jar( {} ); return $ua; },
);
sub request {
my ($self, $cookie_jar) = #_;
local $self->{ua}->{cookie_jar} = $cookie_jar;
$self->ua->get('http://www.stackoverflow.com');
}
package main;
my $foo = Foo->new;
my $new_jar = HTTP::Cookies->new;
$foo->request( $new_jar );
So basically I decided to locally overwrite the cookie jar. Unfortunately, when we call get it will still use the cookie jar that is originally inside the UA object.
package Foo;
use strictures;
use Moo;
use LWP::UserAgent;
use HTTP::Cookies;
use Data::Printer;
use feature 'say';
has ua => (
is => 'ro',
default => sub { my $ua = LWP::UserAgent->new; $ua->cookie_jar( {} ); return $ua; },
);
sub request {
my ($self, $cookie_jar) = #_;
say "before local " . $self->{ua}->{cookie_jar};
local $self->{ua}->{cookie_jar} = $cookie_jar;
$self->ua->get('http://www.stackoverflow.com');
print "local jar " . p $self->{ua}->{cookie_jar};
say "after local " . $self->{ua}->{cookie_jar};
}
package main;
use Data::Printer;
use HTTP::Cookies;
my $foo = Foo->new;
say "before outside of local " . $foo->{ua}->{cookie_jar};
my $new_jar = HTTP::Cookies->new;
say "before outside of local " . $new_jar;
$foo->request( $new_jar );
say "after outside of local " . $foo->{ua}->{cookie_jar};
print "global jar " . p $foo->ua->cookie_jar;
__END__
before outside of local HTTP::Cookies=HASH(0x30e1848)
before outside of local HTTP::Cookies=HASH(0x30e3b20)
before local HTTP::Cookies=HASH(0x30e1848)
local jar HTTP::Cookies {
public methods (13) : add_cookie_header, as_string, clear, clear_temporary_cookies, DESTROY, extract_cookies, load, new, revert, save, scan, set_cookie, set_cookie_ok
private methods (3) : _host, _normalize_path, _url_path
internals: {
COOKIES {}
}
}after local HTTP::Cookies=HASH(0x30e3b20)
after outside of local HTTP::Cookies=HASH(0x30e1848)
global jar HTTP::Cookies {
public methods (13) : add_cookie_header, as_string, clear, clear_temporary_cookies, DESTROY, extract_cookies, load, new, revert, save, scan, set_cookie, set_cookie_ok
private methods (3) : _host, _normalize_path, _url_path
internals: {
COOKIES {
stackoverflow.com {
/ {
prov [
[0] 0,
[1] "185e95c6-a7f4-419a-8802-42394776ef63",
[2] undef,
[3] 1,
[4] undef,
[5] 2682374400,
[6] undef,
[7] {
HttpOnly undef
}
]
}
}
}
}
}
As you can see, the HTTP::Cookies object gets localized and replaced correctly. The addresses look totally correct.
But the output of p tells a different story. LWP::UA has not used the local cookie jar at all. That remains a fresh, empty one.
How can I make it use the local one instead?
I have tried using Moo, Moose and classic bless objects. All show this behaviour.
Edit: Since this came up in the comments, let me give a little more background why I need to do this. This is going to be a bit of a rant.
TLDR: Why I do not want alternative solution but understand and fix the problem
I'm building a Dancer2-based webapp that will run with Plack and multiple workers (Twiggy::Prefork - multiple threads in multiple forks). It will allow users to use a service of a third company. That company offers a SOAP webservice. Think of my application as a custom frontend to this service. There is a call to 'log the user in' on the webservice. It returns a cookie (sessionid) for that specific user and we need to pass that cookie with each consecutive call.
To do the SOAP-stuff I am using XML::Compile::WSDL11. Compiling the thing is pretty costly, so I do not want to do that each time a route is handled. That would be way inefficient. Thus the SOAP client will be compiled from the WSDL file when the application starts. It will then be shared by all workers.
If the client object is shared, the user agent inside is shared as well. And so is the cookie jar. That means that if there are two requests at the same time, the sessionids might get mixed up. The app could end up sending wrong stuff to the users.
That's why I decided to localize the cookie jar. If it's a local unique one for a request, it will never be able to interfere with another worker's request that is happening in parallel. Just making a new cookie jar for each request will not cut it. They would still be shared, and might even get lost because they would overwrite each other in the worst case.
Another approach would be to implement a locking mechanism, but that would totally beat the purpose of having multiple workers.
The only other solution I see is using another SOAP-client alltogether. There is SOAP::WSDL, which does not run on newer Perls. according to CPAN testers it breaks on 5.18 andI have verified that. It would be more efficient as it works like a code generator and precreates classes that are cheaper to use than just compiling the WSDL file every time. But since it's broken, it is out of the question.
SOAP::Lite will compile the WSDL, and badly. It is not something anyone should use in production if it can be avoided in my opinion. The only alternative left that I see is to implement the calls without using the WSDL file and parsing the results directly with an XML parser, ignoring the schema. But those are BIG results. It would be very inconvenient.
My conclusion to this rant is that I would really like to understand why Perl does not want to localize the cookie jar in this case and fix that.
Perhaps instead of using local you use the clone and cookie_jar methods of LWP::UserAgent.
...
sub request {
my ($self, $new_cookie_jar) = #_;
my $ua = $self->ua; # cache user agent
if( defined $new_cookie_jar ){
# create a new user agent with the new cookie jar
$ua = $ua->clone;
$ua->cookie_jar( $new_cookie_jar );
}
my $result = $ua->get('http://www.stackoverflow.com');
# allow returning the newly cloned user agent
return ( $result, $ua ) if wantarray;
return $result;
}
If you don't want to do that, you should at least use the methods instead of manipulating the internals of the objects.
...
sub request {
my ($self, $new_cookie_jar) = #_;
my $ua = $self->ua; # cache user agent
my $old_cookie_jar = $ua->cookie_jar( $new_cookie_jar );
my $result = $ua->get('http://www.stackoverflow.com');
# put the old cookie jar back in place
$ua->cookie_jar( $old_cookie_jar );
return $result;
}
I'm pretty new to Perl. While I just created a simple scripts to retrieve a file with
getstore($url, $file);
But how do I know whether the task is done correctly or the connection interrupted in the middle, or authentication failed, or whatever response. I searched all the web and I found some, like a response list, and some talking about useragent stuff, which I totally can't understand, especially the operator $ua->.
What I wish is to an explanation about that operator stuff (I don't even know what -> used for), and the RC code meaning, and finally, how to use it.
Its a lot of stuff so I appreciate any answer given, even just partially. And, thanks first for whoever will to help. =)
The LWP::Simple module is just that: quite simplistic. The documentation states that the getstore function returns the HTTP status code which we can save into a variable. There are also the is_success and is_error functions that tell us whether a certain return value is ok or not.
my $url = "http://www.example.com/";
my $filename = "some-file.html";
my $rc = getstore($url, $filename)
if (is_error($rc)) {
die "getstore of <$url> failed with $rc";
}
Of course, this doesn't catch errors with the file system.
The die throws a fatal exception that terminates the execution of your script and displays itself on the terminal. If you don't want to abort execution use warn.
The LWP::Simple functions provide high-level controls for common tasks. If you need more control over the requests, you have to manually create an LWP::UserAgent. An user agent (abbreviated ua) is a browser-like object that can make requests to servers. We have very detailed control over these requests, and can even modify the exact header fields.
The -> operator is a general dereference operator, which you'll use a lot when you need complex data structures. It is also used for method calls in object-oriented programming:
$object->method(#args);
would call the method on $object with the #args. We can also call methods on class names. To create a new object, usually the new method is used on the class name:
my $object = The::Class->new();
Methods are just like functions, except that you leave it to the class of the object to figure out which function exactly will be called.
The normal workflow with LWP::UserAgent looks like this:
use LWP::UserAgent; # load the class
my $ua = LWP::UserAgent->new();
We can also provide named arguments to the new method. Because these UA objects are robots, it is considered good manners to tell everybody who sent this Bot. We can do so with the from field:
my $ua = LWP::UserAgent->new(
from => 'ss-tangerine#example.com',
);
We could also change the timeout from the default three minutes. These options can also be set after we constructed a new $ua, so we can do
$ua->timeout(30); # half a minute
The $ua has methods for all the HTTP requests like get and post. To duplicate the behaviour of getstore, we first have to get the URL we are interested in:
my $url = "http://www.example.com/";
my $response = $ua->get($url);
The $response is an object too, and we can ask it whether it is_success:
$response->is_success or die $response->status_line;
So if execution flows past this statement, everything went fine. We can now access the content of the request. NB: use the decoded_content method, as this manages transfer encodings for us:
my $content = $response->decoded_content;
We can now print that to a file:
use autodie; # automatic error handling
open my $fh, ">", "some-file.html";
print {$fh} $content;
(when handling binary files on Windows: binmode $fh after opening the file, or use the ">:raw" open mode)
Done!
To learn about LWP::UserAgent, read the documentation. To learn about objects, read perlootut. You can also visit the perl tag on SO for some book suggestions.
I'm working with SOAP::WSDL and another company's custom WSDL file. Every time they make a change for me and I recreate my modules, something breaks. Finding the problem is rather tedious because I don't find a proper way to access the actual request that is sent to the SOAP server.
The only way to get to the request so far has been to use tcpdump in conjunction with wireshark to extract the request and result. That works, but since I don't have root privileges on the dev machine I have to get an admin over every time I want to do that. I feel there must be another way to get to the HTTP::Request object inside the SOAP::WSDL thing. But if the server returns a fault, I don't even have a response object, but rather a SOAP::WSDL::SOAP::Typelib::Fault11 object that has no visible relation to the request.
I've also tried using the debugger but I'm having trouble finding the actual request part. I've not yet understood how to tell the debuger to skip to a specific part deep inside a complex number of packages.
I stumbled across this, having the same problem myself. I found the answer is using both options that raina77ow listed.
$service->outputxml(1);
returns the whole SOAP envelope xml, but this needs to be combined with
$service->no_dispatch(1);
With no_dispatch set, the SOAP request is printed, instead of the reply from the request. Hopefully this can help others.
Have you tried to use SOAP::WSDL::Client tracing methods - and outputxml in particular? It returns the raw SOAP envelope which is to be sent to the server.
You can also use no_dispatch configuration method of SOAP::WSDL package:
When set, call() returns the plain request XML instead of dispatching
the SOAP call to the SOAP service. Handy for testing/debugging.
I found a way to at least print out the generated XML code.
First, I looked at SOAP::WSDL::Client as raina77ow suggested. That wasn't what I needed, though. But then I came across SOAP::WSDL::Factory::Serializer. There, it says:
Serializer objects may also be passed directly to SOAP::WSDL::Client
by using the set_serializer method.
A little fidgeting and I came up with a wrapper class for SOAP::WSDL::Serializer::XSD which is the default serializer used by SOAP::WSDL. A look at the code helped, too.
Here's the module I wrote. It uses SOAP::WSDL::Serializer::XSD as a base class and overloads the new and serialize methods. While it only passes arguments to new, it grabs the returned XML from serialize and prints it, which suffices for debugging. I'm not sure if there's a way to put it somewhere I can easily get it from.
package MySerializer;
use strict;
use warnings;
use base qw(SOAP::WSDL::Serializer::XSD);
sub new {
my $self = shift;
my $class = ref($self) || $self;
return $self if ref $self;
# Create the base object and return it
my $base_object = $class->SUPER::new(#_);
return bless ($base_object, $class);
}
sub serialize {
my ($self, $args_of_ref) = #_;
# This is basically a wrapper function that calls the real Serializer's
# serialize-method and grabs and prints the returned XML before it
# giving it back to the caller
my $xml = ref($self)->SUPER::serialize($args_of_ref);
print "\n\n$xml\n\n"; # here we go
return $xml;
}
1;
And here's how I call it:
my $serializer = MySerializer->new();
$self->{'_interface'} = Lib::Interfaces::MyInterface->new();
$self->{'_interface'}->set_serializer($serializer); # comment out to deactivate
It's easy to deactivate. Only put a comment in the set_serializer line.
Of course printing a block of XML to the command line is not very pretty, but it gets the job done. I only need it once in a while why coding/testing, so this is fine I guess.
I couldn't really figure out how to use WWW::Mechanize::TreeBuilder. Basically I get a HTML page using WWW::Mechanize. There is a //div[#class='cars'] whose text I want to extract.
I tried:
my $mech = WWW::Mechanize->new();
$mech->get('the url');
WWW::Mechanize::TreeBuilder->meta->apply($mech);
$mech->get_ok('//div[#class="cars"]');
print $mech->look_down(_tag => 'p')->as_trimmed_text . "\n";
It says:
Can't locate object method "get_ok" via package "Class::MOP::Class::__ANON__::SERIAL::2" at orpi_crawler.pl
get_ok is from Test::WWW::Mechanize which you neglected to load. Read the synopsis of WWW::Mechanize::TreeBuilder carefully.