Invoke-WebRequest is unable to POST data on Single Page App (1 URI) and KeepAlive is disabled - powershell

TL;DR
I need to submit multiple forms on a site that reloads pages but the server closes session after every request on a single page app that only has one URL and I think this is preventing my POST method from going through. The main problem is every request uses the first state of the page and I can't get to states farther in the process.
What I'm trying to accomplish
I am trying to automate a process that requires me to go to take the following steps in order:
Navigate to webpage. (GET)
Click on a button that reloads the page with new data but uses same URL. (POST)
Enter text into a field on new page.
Click on the form to submit the text. (POST)
... perform unrelated admin tasks....
I'm trying to automate this process using the Invoke-WebRequest cmdlet in PowerShell following steps similar to those found in the PowerShell Cookbook regarding FaceBook login.
When I run an Invoke-WebRequest from a completely fresh PowerShell Session I get a response but I can't reuse that session ever again. To make another request I need to create a new -SessionVariable or use -DisableKeepAlive.
The server will always return a connection close in the response no matter what even though it is using http 1.1 and it's not my site so I can' change this.
So how can I go about establishing a connection to the server that I can reuse to POST the form data? I feel like it should be doable because it is clearly happening on the WebPage itself.
When I go to the WebPage, open the Developer Tools in Chrome and step through the process the header contains this in the Form Data field:
RAW
ggt_textbox%2810007%29=&action=adminloginbtn&ggt_hidden%2810008%29=2
PARSED and DECODED
ggt_textbox(10007):
action:adminloginbtn
ggt_hidden(10008):2
If I try to do something like this:
Invoke-WebRequest $uri -SessionVariable session -Verbose -Method POST -Body "ggt_textbox%2810007%29=&action=adminloginbtn" -DisableKeepAlive
It returns the page I'm expecting in step 2. So I performed steps 3 and 4 in Chrome to try and do the same thing. I get the the following Form Data in Chrome Dev Tools:
RAW
ggt_textbox%2810006%29=textIentered&action=loginbtn&ggt_hidden%2810008%29=3
PARSED
ggt_textbox(10006):textIentered
action:loginbtn
ggt_hidden(10008):3
So that made me think I could do something like this:
Invoke-WebRequest $uri -WebSession $session -Verbose -Method POST -Body "ggt_textbox%2810006%29=textIentered&action=loginbtn&ggt_hidden%2810008%29=3"# -DisableKeepAlive
But since the main page and the login page use the same URI it tries to POST to a form that doesn't exist because it's looking at the very first page.
I did some more digging and found when I perform this same action from the webpage itself it returns a 302 Moved Temporarily status code the response header actually has a cookie in it (still closes the connection) which is a first and then appears to do a GET request using the new cookie and I'm now logged into the admin page.
So I think I have two problems I need to get around:
How can I get to the form that exists after I click the first button since they use the same URI?
How can I get around the 302 status since I'm only getting back a header and nothing else. I think I need to do a GET request using the cookie from the header but I'm not sure how to specify a cookie with Invoke-WebRequest. I think I would need to use the -Header parameter and specify Cookie: COOKIENAME=CookieID
I think most of all I need to get through my first question and then from there I can start working towards my second.
All help is appreciated and I can provide any header/source needed but the web page is super simple so there is not a whole lot going on in the front end other than a couple of buttons and a logo with a little bit of inline JavaScript.
EDIT
After doing some additional reading about 302 and redirects I found out that shouldn't be a problem. The reason for this is explained in this question.

I figured out my problem. The inline JavaScript is validating to make sure the length of the string is greater than 0 before submitting the login. I don't think there is away to by pass the client side validation easily in a script.

Related

Verifying login to website via powershell - do I need to scrape regardless?

I need to test if a set of default credentials work for a site programmatically.
Right now I have two options at my disposal. The app will login if I pass a url like - https://this.com/app/loginprocess.aspx?username=user1&password=pass1.
I can also use POST and Innvoke-WebRequest to sign into the site. Something like;
$UserCredentials = Get-Credential
$InvokeResponse = Invoke-WebRequest $Url -SessionVariable MyInvokeSession
$form = $InvokeResponse.Forms
$form.fields['Username'] = '$UserCredentials.UserName'
$form.fields['Password'] = $UserCredentials.GetNetworkCredential().Password
$InvokeResponse = Invoke-WebRequest -Uri ($Url + $form.Action) -WebSession $MyInvokeSession -Method POST -Body $form.Fields
My first question is - no matter which of the two methods I use, I must scrape the webpage to check if the login was successful, correct?
My second question is - Is scraping the page the only way to check if the login worked? If so what is a good way to scrape for a constant value?
Thanks.
The urls are always structured the same, and the form is always the same.
It depends how the site was coded - but every HTTP request returns a Status code - and generally you should be able to use this to validate a login without having to look at the actual response itself.
Take a look at the value of $InvokeResponse.StatusCode after making the request. Try a correct and incorrect login - does it return the same status code or different? If you get a 200 or similar for valid login attempt and a 403 for the invalid login attempt those are standard success/fail codes for a login and you can use them in your logic. Another way of testing if a login worked is trying to login, then access a page that only works for logged in users (try the same test on the response code of that page). If both tests give the same code for success and fail you need to use the content to decide if you logged in or not.
You can check the Status code like this:
$r = Invoke-WebRequest -Uri https://www.google.com/
Write-Host "Response code $($r.StatusCode)";
if($r.StatusCode -lt 400){
Write-Host "Probably OK";
}else{
Write-Host "Something went wrong";
}
and i get the output:
Response code 200
Probably OK
FYI codes starting 2 are Success(200, 201, 204 are common ones), codes starting 3 are redirects(301, 304 etc), codes starting 4 is a client error (404 not found, 403 not authorized etc) and codes starting 5 something went wrong server side. BUT its totally arbitrary what code the programmer that made that page decided to send back for any given input so test thoroughly :-)
Side note - use POST not GET when sending credentials. By default most servers will write GET request parameters to its log files in plain text and not for POST requests. You might want to change that password just in case

Scrape webpage using powershell

I need to scrape a webpage.
I specifically need to extract the section "USDT Funding Market" on the page.
$WebExtract = Invoke-WebRequest -Uri "https://www.kucoin.com/margin/lend/USDT"
Although $WebExtract.content or $WebExtract.AllElements does not contain the "USDT Funding Market".
The reason for this is that your initial web request is only returning you the HTML of the page, however a lot of modern web applications like this have some JavaScript under the hood that will reach out to their API's and fill the page with data after the initial HTML is loaded.
The good news is, usually with public websites like this you can hit the underlying API's without worrying about authentication - which actually makes parsing the data easier, as it's typically in JSON format.
Try opening the debug console in your browser (usually F12), open the network tab and refresh the page - you can typically find these API requests/endpoints in there, e.g;
Simply copy that URL out and invoke the web request to the API instead, for example;
$request = Invoke-WebRequest -Uri 'https://www.kucoin.com/_api/currency/prices?base=USD&targets='
$json = $request.Content | ConvertFrom-Json
$json.data
This might not be the specific API endpoint/data you're after, but hopefully this points you in the right direction. If you poke around in the console you should be able to find the one you're looking for.
EDIT:
Apologies for the misleading information here, I didn't notice that these numbers were updating in real time on the site. The above is still pretty common for a lot of websites so I'll leave it, however on sites that are this responsive/live another common technology used is Web Sockets.
If you have a look at the last lines of the console, you should be able to see a request ending with 'transport=websocket' to a URL like this;
wss://ws-web.kucoin.com/socket.io/?token=...%3D%3D&format=json&acceptUserMessage=false&connectId=connect_welcome&EIO=3&transport=websocket
If you select this line and head over to the Messages tab in your browser console/debugger, you'll be able to see the web socket messages being returned;
This looks like the data you're after, but I'm not hugely familiar with querying Web Sockets through PowerShell.
Simply invoking a web request won't work here, as it's using the web sockets protocol to handle the communication. You would also need to find a way to get a valid token for opening a web sockets connection (likely just a web request for that).
Perhaps these posts will help;
http://wragg.io/powershell-slack-bot-using-the-real-time-messaging-api/
How to use a Websocket client to open a long-lived connection to a URL, using PowerShell V2?

Invoke webrequest fails to fetch website data even though works in all browsers

Powershell: The invoke-webrequest is not working for me for a particular site. Tried many things like setting TLS settings etc. But still fails.
One unique thing is it works for first request, post then it fails for 15mins but such is not the case when browsing the site via browser. Can anyone help me please
Invoke-WebRequest 'https://www1.nseindia.com/products/content/equities/equities/eq_turnapr2020.htm'
For the next request it waits forever then goes timeout. But in browser that is not the case(in IE, chrome)
Timeout error image. First request works
It was because of website's anti-scrape mechanism. To Bypass that we need to make it feel like a user request via browser. For this current problem with this website., solution was to include headers in the Invoke-WebRequest.
You can take copy of request headers via Chrome Developer Tools > Network tab > (right click url) > Copy > copy as powershell.
Then paste that & take the hashtable of headers.(you can then remove cookies section).

Authentication With Invoke-WebRequest

The login page is this: https://login.procore.com/
I feel like I'm close to getting it to work, but have hit a brick wall due to a lack of understanding of login procedures. Here is the code so far, without the actual sign in information.
$r=Invoke-WebRequest https://login.procore.com/ -SessionVariable fb
$form = $r.Forms[0]
$form.Fields["session_email"] = "xxxxxxxxx"
$form.Fields["session_password"] = "xxxxxxxx"
$r=Invoke-WebRequest ('https://login.procore.com/' + $form.Action) -WebSession $fb -Method $form.Method -Body $form.Fields
Could someone help me understand what is missing? I did notice that $form.Fields contains an empty field named: session_sso_target_url, but honestly have no clue what it means, or how to use it.
You've given me insufficient info to provide a complete answer, because I don't have a login and don't see a way to sign up for a free trial, and you haven't stated what kind of error you are getting.
I hazard a guess that session_sso_target_url relates to federation, which is semantically related to single sign-on (SSO). In federation, an application is configured to accept logins from another login domain. The obvious example in corporateland is ADFS, but any time you see an app that says Login with Facebook or Login with Google, that's the same thing. Federation is a big topic. The meaning of having a target URL is that the browser is often redirected to the identity provider (ADFS / FB / GOOG etc) with the callback URL that the browser should come back to once it is authenticated.
Suffice it to say that I suspect that you need do nothing with this field! And the reason I say this is because I hit it with Fiddler.
You should know about Fiddler. It is a cost-free debugging proxy from Telerik. I am not affiliated with Telerik, but I owe them hours of saved time when web scraping. (It is not the only tool for the job, and if any moderator deems that I am violating site rules, I will be happy to sanitise this post.)
Do this:
install Fiddler
Set it up to listen on 127:0.0.1:whatever and to be your system proxy
in Tools > Options > HTTPS, set it to decrypt HTTPS (this will replace all certs with auto-generated self-signed ones, so do not leave this running while you perform other tasks)
Set your filters to only include traffic to *.procore.com
Log in through your browser - you should now see web traffic in the left-hand pane. This captured traffic is your baseline.
Select any one web request and look at the Inspectors tab in the right-hand pane. You can look at Raw, Forms, Cookies, etc. This gives you a low-level view of what your client is doing.
Run your code snippet. You can now compare the differences between the baseline and your code, and adjust accordingly.

Workfront AtTask API Authentication Error

I am trying to query the Workfront REST Services from PowerShell
I am using a URL like this
https://ourcompany.attask-ondemand.com/attask/api/v4.0/project/search?apiKey=XYZetc
This returns JSON in both IE and Chrome and works in my Web Service tester.
All this runs behind a corporate proxy obviously.
The PowerShell I am using is
$postResult = Invoke-RestMethod -Uri $URI -Method "GET" -Proxy
http://internalproxyname:80 -ProxyUseDefaultCredentials
This fails with an Error
{"error":{"class":"com.attask.common.AuthenticationException","message":"You
are not currently logged in"}}
This looks like an Error at the attask END not the proxy at our end (I get different errors running this as a non auth user or with mangled credentials passed to the Proxy
The docs suggest I don't need to be logged in if I was using an apiKey. I am not logged in in the browsers I am using (I don't even have a user account on the workfront instance)
I have trawled various blogs and stack answers to no avail. Can anyone point me in the right direction for figuring out what is going on? or what I might be doing wrong.
I have Enabled a trust all certs policy and set the validation callback to Ignore within the powershell
but equally I've tried this with these turned off and also investigated various properties on the ServicePointManager. I can produce any number of different errors/issues but the closest I get seems to be the above.
Oh and the Workfront API docs and examples being wrong didn't help me when I was getting started :-)
many thanks
Steve
OK this was me being stupid. There was a bug in the code generating the URI (an extra slash) and the attask default error response is auth error not mangled request.
For reference the URL needs to be in the form shown in my original post. Don't miss off the api version number and don't use a port number as the code samples show.
Always look for the simple things first (I should remember that)
Doh!