I am using Perl WWW::Mechanize::Chrome to automate a JS heavy website.
In response to a user click the page among many other requests, requests and loads a JSON file using XHR.
Is there some way to save this particular JSON data to a file?
To intercept requests like that, you generally need to use the webRequest API to filter and retrieve specific responses. I do not think you can do that via WWW::Mechanize::Chrome.
WWW::Mechanize::Chrome tries to give you the content of all requests, but Chrome itself does not make the content of XHR requests available ( https://bugs.chromium.org/p/chromium/issues/detail?id=457484 ). So the approach I take in (for example ) Net::Google::Keep is to replay the XHR requests using plain Perl LWP requests by copying the cookies and parameters from the Chrome requests-
Please note that the official support forum for WWW::Mechanize::Chrome is https://perlmonks.org , not StackOverflow.
Related
I need to scrape a webpage.
I specifically need to extract the section "USDT Funding Market" on the page.
$WebExtract = Invoke-WebRequest -Uri "https://www.kucoin.com/margin/lend/USDT"
Although $WebExtract.content or $WebExtract.AllElements does not contain the "USDT Funding Market".
The reason for this is that your initial web request is only returning you the HTML of the page, however a lot of modern web applications like this have some JavaScript under the hood that will reach out to their API's and fill the page with data after the initial HTML is loaded.
The good news is, usually with public websites like this you can hit the underlying API's without worrying about authentication - which actually makes parsing the data easier, as it's typically in JSON format.
Try opening the debug console in your browser (usually F12), open the network tab and refresh the page - you can typically find these API requests/endpoints in there, e.g;
Simply copy that URL out and invoke the web request to the API instead, for example;
$request = Invoke-WebRequest -Uri 'https://www.kucoin.com/_api/currency/prices?base=USD&targets='
$json = $request.Content | ConvertFrom-Json
$json.data
This might not be the specific API endpoint/data you're after, but hopefully this points you in the right direction. If you poke around in the console you should be able to find the one you're looking for.
EDIT:
Apologies for the misleading information here, I didn't notice that these numbers were updating in real time on the site. The above is still pretty common for a lot of websites so I'll leave it, however on sites that are this responsive/live another common technology used is Web Sockets.
If you have a look at the last lines of the console, you should be able to see a request ending with 'transport=websocket' to a URL like this;
wss://ws-web.kucoin.com/socket.io/?token=...%3D%3D&format=json&acceptUserMessage=false&connectId=connect_welcome&EIO=3&transport=websocket
If you select this line and head over to the Messages tab in your browser console/debugger, you'll be able to see the web socket messages being returned;
This looks like the data you're after, but I'm not hugely familiar with querying Web Sockets through PowerShell.
Simply invoking a web request won't work here, as it's using the web sockets protocol to handle the communication. You would also need to find a way to get a valid token for opening a web sockets connection (likely just a web request for that).
Perhaps these posts will help;
http://wragg.io/powershell-slack-bot-using-the-real-time-messaging-api/
How to use a Websocket client to open a long-lived connection to a URL, using PowerShell V2?
I'm very new to API testing.
I'm trying to make use of Google Chrome's developer tools to understand and explore this subject.
Question 1:
Is it possible to get the response (possibly in JSON format) of a simple GET request using chrome developer tools?
What I'm currently doing is:
Open chrome developer tools
Go to Network tab
Clear existing logs
Send a post request simply by hitting a URL. e.g. https://stackoverflow.com/questions/ask
Check the corresponding docs loaded
Question 2:
What are the relevance "Reponse Headers" shown on the image above? I mean, am I correct to think that this is the response I am getting after doing the GET request?
Any help or references you can give are much appreciated!
If you want to test a rest api I sugest you get postman which is meant for that purpose.
Going to your questions:
Question 1: Is it possible to get the response (possibly in JSON
format) of a simple GET request using chrome developer tools?
The first point to make clear is that it is the server who will or will not send a json response to the browser. Not the browser who can choose to see any response as json.
If you send a GET request that the server responds with a json object or json array and the Content-type header is set to application/json, you will see that response already formated in the main window of the browser.
If the Content-type is set to text/html, for example, then you will still get the a json text as response in the main window but it won't be nicely formated. Depending on how the response was sent, sometimes you can see it nicely formatted by left clicking the browser window and selecting view source page.
For this you don't need developer's tools unless you want to see how long did it take to receive the response, or check the headers for some specific value, etc, but nothing to do with receiving the response or rendering it on screen.
Developer's tools is more usefull if you are working with javascript/jquery and/or if you are sending ajax requests (GET or POST). In these cases you can debug the function and also see the ajax request to check what actually went out from your browser and what was received as a response.
Question 2: What are the relevance "Reponse Headers" shown on the
image above? I mean, am I correct to think that this is the response I
am getting after doing the GET request?
In the response you get the two things, the headers, and the content. The json objects you see are part of the content not the headers.
The headers will tell the browser, for example, that the body is json (vs. an html documenet or something different), besides of other information like cache-control, or how long the body is.
Search for http headers for more information on which are teh standard headers.
To answer your questions narrowly:
Is it possible to get the response (possibly in JSON format) of a simple GET request using chrome developer tools?
Yes! Just click the Response tab, which is to the right of the Headers tab that's open in your screenshot.
What are the relevance "Reponse Headers" shown on the image above? I mean, am I correct to think that this is the response I am getting after doing the GET request?
Yes, these are the HTTP headers that were sent with the response to your request.
The broader question here is "how do I test a REST API?" DevTools is good for manual testing, but there are automated tools that can make it more efficient. I'll leave that up to you to learn more about that broad topic.
Can we send POST HTTP requests in Google Chrome when using Rest Services?
I have tried few extensions but I need directly from Chrome browser
I think, using the URL bar will always result in a GET.
To send POST requests from a browser, set up an HTML <form> with method="POST", use the action attribute for the REST-URL and input tags for other parameters.
You can do the post and get in the same way as the browser does.
You can use the header to put in information in key, value pair.
Here is a tutorial on how you can do it - Send POST data using XMLHttpRequest
But it would be better if you use chrome extension POSTMAN which is very extensive and clean for testing REST services.
I am working on REST API with oauth2 authorization.
For Oauth2 server i use https://github.com/bshaffer/oauth2-server-php
Php doc says here http://php.net/manual/en/wrappers.php.php
Prior to PHP 5.6, a stream opened with php://input could only be read once; the stream did not support seek operations. However, depending on the SAPI implementation, it may be possible to open another php://input stream and restart reading. This is only possible if the request body data has been saved. Typically, this is the case for POST requests, but not other request methods, such as PUT or PROPFIND.
In short it means that it is possible to read POST body twice, but not PUT.
But Oauth2 server reads it first time here https://github.com/bshaffer/oauth2-server-php/blob/develop/src/OAuth2/Request.php#L114
So when i read raw body in Yii2 Request, it is empty. (only on PUT, on POST and PATCH it is ok and can be read twice).
https://github.com/yiisoft/yii2/blob/master/framework/web/Request.php#L345
I know that this is kind of expected, no bugs. But what would be the solution for this?
Before you create that auth server, run this (depending in where you do authentication, you can use beforeAction(), or even init():
$content = Yii::$app->request->rawBody;
$authentication = Request::createFromGlobals();
if ($content)
$authentication->content = $content;
Now, I don't know how/where you use the component, so it might not fully work, but in theory it should.
I want to capture how parameters are being sent. Usually what I do is to make a request and check on Firebug's params tab what are the parameters sent. However, when I try to do this on the following site (http://www.infraero.gov.br/voos/index_2.aspx), it doesn't work - I can't see what are the parameters in order to repeat this request using curl. How can I get it? I'm not sure but I think that cookies are being used.
EDIT
I was able to get the request content, but couldn't understand it. It seems it uses javascript to generate the proper request. How can I reproduce this request via cURL?
Did you see this previous question cURL post data to asp.net page ? That might answer the question right there (all I did was search "ASP.NET cURL"). And this one: Unable to load ASP.NET page using Python urllib2 talks about Python, but it approaches it in a way that should translate to cURL.
But for my $0.02, I wouldn't bother trying to untangle ASP.NET's and __VIEWSTATE and javascript. Is it an absolute requirement that you use cURL?
I think you would be better off using a client that works more like a real browser and understands javascript. That's a bit of work, but it isn't as bad as it sounds. I've done this before with http://watirwebdriver.com/ and a short Ruby script. Here's how to do it with Python and Mechanize (this is probably a bit more lightweight).
http://phantomjs.org/ is another option that you script using javascript. If you Google "Scraping ASP.NET" you will see that this is a common problem.
You didn't say how you want it done, but you can send the request with curl simply with curl -d name1=contents1&name2=contents2 [TARGETURL] etc.
Note that you probably first need to fetch the main page and extract the "__VIEWSTATE" form field and submit back that (VERY huge) contents to get your submission accepted.