Parsing Cyrillic sites with Zend Framework encoding issue - zend-framework

$url = "http://www.kinopoisk.ru/picture/791547/";
require_once 'Zend/Http/Client.php';
require_once 'Zend/Dom/Query.php';
$client = new Zend_Http_Client($url, array(
'maxredirects' => 0,
'timeout' => 30,
'useragent' => 'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/535.7 (KHTML, like Gecko) Chrome/16.0.912.36 Safari/535.7'));
$response = $client->request();
$html = $response->getBody();
$dom = new Zend_Dom_Query($html);
$title = $dom->query('title');
$titleText = $title->current()->textContent;
echo $titleText;
// Locally returns "Постеры: ВАЛЛ·И (WALL·E)"
// Remotely returns "Ïîñòåðû: ÂÀËË·È (WALL·E)"
.htaccess setting: AddDefaultCharset utf-8
Response headers in remote site:
Content-Encoding gzip
Vary Accept-Encoding
Though there are no such responses in local
So I think that it's the server fault? how to resolve this?

Related

How to prevant request by guzzle get redirected by client?

How to prevant request by guzzle get redirected by client?
screnshoot
i already give the parameter allow_redirects set to false but still my request get redirected.
$client = new \GuzzleHttp\Client([
'headers' => [
'User-Agent' => 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/85.0.4183.102 Safari/537.36',
], 'verify' => false
]);
$response = $client->post(
"{$value->url}/wp-json/apg/v1/webhooks",
[
'form_params' => [
'token_auth' => $this->setting->token_auth,
'webhooks_type'=> 'theme',
'theme' => $theme,
'url_download' => url('/storage/themes')
],
'allow_redirects' => FALSE
]
);
$jsonResponse = json_decode($response->getBody()->getContents());

Why does LWP::UserAgent succeed and Mojo::UserAgent fail?

If I make a request like this:
my $mojo_ua = Mojo::UserAgent->new->max_redirects(5);
$mojo_ua->inactivity_timeout(60)->connect_timeout(60)->request_timeout(60);;
$mojo_ua->transactor->name('Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/65.0.3325.181 Safari/537.36');
my $headers = {
'Accept' => 'application/json',
'Accept-Language' => 'en-US,en;q=0.5',
'Connection' => 'keep-alive',
'User-Agent' => 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/65.0.3325.181 Safari/537.36',
'x-csrf-token' => 'Fetch',
'Accept-Encoding' => 'gzip, deflate, br',
'DataServiceVersion' => '2.0',
'MaxDataServiceVersion' => '2.0',
'Referer' => 'https://blah.blas.com/someThing.someThing'
};
my $url = Mojo::URL->new('https://blah.blah.com/irj/go/sap/FOO_BAR_BAZ/');
my $tx = $mojo_ua->get($url, $headers);
$tx = $mojo_ua->start($tx);
my $res = $tx->result;
the request times out, but if I take the exact same request, built in the same way and do this:
my $lwp_ua = LWP::UserAgent->new;
my $req = HTTP::Request->parse( $tx->req->to_string );
$req->uri("$url");
my $res = $lwp_ua->request($req);
it succeeds.
It happens in a few cases that Mojo::UserAgent fails, and LWP::UserAgent succeeds with exactly the same transaction, and I'm starting to get curious.
Any idea as to why?
Your call to
$mojo_ua->get($url, $headers)
has already sent the HTTP request and received the response from the server, errored, or timed out. You don't need to call
$mojo_ua->start($tx)
as well, and that statement should be removed
If you really want to first build the transaction and then start it, you need
my $tx = $mojo_ua->build_tx(GET => $url, $headers);
$tx = $mojo_ua->start($tx);
but I don't see any reason why you should need to do it this way

Perl: Need an LWP & HTTP::Request POST code that actually works

I have been scratching my head trying to get LWP and HTTP::Request to actually pass a POST parameter to a web server. The web server can see the fact that the request was a POST transaction, but it is not picking up the passed parameters. I have been searching all day on this and have tried different things and I have yet to find something that works. (The web server is working, I am able to manually send post transactions and when running the whole script, I am getting '200' status but I am not seeing any posted elements. Any help would be appreciated. Tnx.
my $ua2 = LWP::UserAgent->new;
$ua2->agent("Mozilla/5.0 (compatible; MSIE 6.0; Windows 98)");
my $req2 = HTTP::Request->new(POST => "$url", [ frm-advSearch => 'frmadvSearch' ]);
$req2->content_type('text/html');
my $res2 = $ua2->request($req2);
$http_stat = substr($res2->status_line,0,3);
my $res = $ua->post($url,
Content => [
'frm-advSearch' => 'frmadvSearch',
],
);
which is short for
use HTTP::Request::Common qw( POST );
my $req = POST($url,
Content => [
'frm-advSearch' => 'frmadvSearch',
],
);
my $res = $ua->request($req);
Here's a Mojo::UserAgent example, which I find easier to debug:
use Mojo::UserAgent;
my $ua = Mojo::UserAgent->new;
$ua->transactor->name( 'Mozilla/5.0 (compatible; MSIE 6.0; Windows 98)' );
my $url = 'http://www.example.com/form/';
my $tx = $ua->post( $url, form => { 'frm-advSearch' => 'frmadvSearch' } );
say $tx->req->to_string;
The transaction in $tx knows about the request so I can look at that:
POST /form/ HTTP/1.1
Content-Type: application/x-www-form-urlencoded
User-Agent: Mozilla/5.0 (compatible; MSIE 6.0; Windows 98)
Accept-Encoding: gzip
Host: www.example.com
Content-Length: 26
frm-advSearch=frmadvSearch

How do I maintain cookies across many WWW::Mechanize runs?

use WWW::Mechanize;
use strict;
my $agent = WWW::Mechanize->new(cookie_jar => {ignore_discard => 0});
$agent->add_header('User-Agent' => 'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:20.0) Gecko/20100101 Firefox/20.0');
$agent->get($url);
my $content = $agent->content;
The cookie_jar attribute expects a HTTP::Cookies object.
WWW::Mechanize->new(
cookie_jar => HTTP::Cookies->new(
file => 'lwp_cookies.dat',
autosave => 1,
)
)
Your mistake was to declare a plain hashref, this means a temporary in-memory cookie store that is destroyed after Mechanize ends.

How do i simulate this particular post request in mechanize

The post request is as follows.
$bot->add_header(
'Host'=>'www.amazon.com',
'User-Agent'=>'application/json, text/javascript, */*',
'Accept'=>'application/json, text/javascript, */*',
'Accept Language'=>'en-us,en;q=0.5',
'Accept Encoding'=>'gzip, deflate',
'DNT'=>'1',
'Connection'=>'keep-alive',
'Content type'=>'application/x-www-form-urlencoded; charset=UTF-8',
'X-Requested with'=>'XMLHttpRequest',
'Referer'=>'https://www.amazon.com/gp/digital/fiona/manage?ie=UTF8&ref_=gno_yam_myk',
'Content length'=>'44',
'Cookie'=>'how do i put the cookie value');
Post parameters in my request :
sid-how do i get the session id.
new email-mailhost#mail.com
My code to logon:
use WWW::Mechanize;
use HTTP::Cookies;
use HTML::Form;
use WWW::Mechanize::Link;
my $bot = WWW::Mechanize->new();
$bot->agent_alias( 'Linux Mozilla' );
# Create a cookie jar for the login credentials
$bot->cookie_jar( HTTP::Cookies->new( file => "cookies.txt",
autosave => 1,
ignore_discard => 1, ) );
# Connect to the login page
my $response = $bot->get( 'https://www.amazon.com/gp/css/homepage.html/' );
# Get the login form. You might need to change the number.
$bot->form_number(3);
# Enter the login credentials.
$bot->field( email => '' );
$bot->field( password => '' );
$response = $bot->click();
#print $response->decoded_content;
$bot->get( 'https://www.amazon.com/gp/yourstore/home?ie=UTF8&ref_=topnav_ys' );
print $bot->content();
$bot->post('https://www.amazon.com/gp/digital/fiona/du/add-whitelist.html/ref=kinw_myk_wl_add', [sid => 'id', email=> 'v2#d.com']);
Data captured:
Host=www.amazon.com
User-Agent=Mozilla/5.0 (Windows NT 6.1; WOW64; rv:12.0) Gecko/20100101 Firefox/12.0
Accept=application/json, text/javascript, */*
Accept-Language=en-us,en;q=0.5
Accept-Encoding=gzip, deflate
DNT=1
Connection=keep-alive
Content-Type=application/x-www-form-urlencoded; charset=UTF-8
X-Requested-With=XMLHttpRequest
Referer=https://www.amazon.com/gp/digital/fiona/manage?ie=UTF8&ref_=gno_yam_myk
Content-Length=39
Cookie=session-id-time=2082787201l; session-id
Pragma=no-cache
Cache-Control=no-cache
POSTDATA=sid=id&email=v%40d.com
Error Message-
Error POSTing https://www.amazon.com/gp/digital/fiona/du/add-whitelist.html/ref=
kinw_myk_wl_add: InternalServerError at logon.pl line 81
See post in WWW::Mechanize.
$bot->post($url, [sid => 'id', email => 'v#d.com']);