I am facing an issue in downloading the txt file from a website. The script below downloads the http code instead of the actual txt file and its contents.
$WebClient = New-Object System.Net.WebClient
$WebClient.DownloadFile("https://thegivebackproject.org/CheckStatus.txt", "D:\CheckStatus.txt")
Short Answer
The server is doing browser sniffing to send different responses based on the User-Agent header in your request. You can get the response you want by sending a canned user agent string:
$useragent = "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.3904.97 Safari/537.36"
Invoke-WebRequest -URI https://thegivebackproject.org/CheckStatus.txt -OutFile c:\temp\CheckStatus.txt -UserAgent $useragent
Long Answer
The server responding to the url you're hitting is doing browser sniffing to decide what content to return. If you give it a User-Agent header that it recognises it will return the response you're expecting (i.e. the literal text "Azeemkhan-WaseemRaza").
If you don't include a User-Agent header (and $WebClient.DownloadFile doesn't include one), the server is responding with a html page instead.
You can see this behaviour yourself if you install a HTTP trace tool like Fiddler. When you hit the page in a browser you see this HTTP request and response pair:
request
GET https://thegivebackproject.org/CheckStatus.txt HTTP/1.1
Host: thegivebackproject.org
Connection: keep-alive
Pragma: no-cache
Cache-Control: no-cache
Upgrade-Insecure-Requests: 1
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.3904.97 Safari/537.36
Sec-Fetch-User: ?1
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3
Sec-Fetch-Site: none
Sec-Fetch-Mode: navigate
Accept-Encoding: gzip, deflate, br
Accept-Language: en-GB,en-US;q=0.9,en;q=0.8
Cookie: SPSI=ee952ba44e33e958f963807ede78624b
response
HTTP/1.1 200 OK
Server: nginx
Date: Tue, 12 Nov 2019 08:13:57 GMT
Content-Type: text/plain
Content-Length: 20
Connection: keep-alive
Last-Modified: Thu, 07 Nov 2019 16:15:48 GMT
Accept-Ranges: bytes
X-Cache: MISS
Azeemkhan-WaseemRaza
but when you use $WebClient.DownloadFile you see this instead:
request
GET https://thegivebackproject.org/CheckStatus.txt HTTP/1.1
Host: thegivebackproject.org
response
HTTP/1.1 200 OK
Server: nginx
Date: Tue, 12 Nov 2019 08:14:21 GMT
Content-Type: text/html; charset=UTF-8
Transfer-Encoding: chunked
Connection: keep-alive
Set-Cookie: SPSI=9c24f8993046ef610e25cc727c4a4ae2; Path=/
Set-Cookie: adOtr=obsvl; Expires=Thu, 2 Aug 2001 20:47:11 UTC; Path=/
Set-Cookie: UTGv2=D-h4d40f620bfdd6c3b77b035ee99f96621134; Expires=Wed, 11-Nov-20 08:14:21 GMT; Path=/
cache-control: no-store, no-cache, max-age=0, must-revalidate, private, max-stale=0, post-check=0, pre-check=0
Vary: Accept-Encoding
X-Cache: MISS
Accept-Ranges: bytes
5908
<!doctype html>
<head>
<meta charset="utf-8">
<meta http-equiv="x-ua-compatible" content="ie=edge">
<meta name="viewport" content="width=device-width, initial-scale=1, shrink-to-fit=no">
<title>StackPath</title>
<style>
* {
box-sizing: border-box;
}
... etc...
The workaround is to include a recognised User-Agent header in your request, which is easier to to if you use Invoke-WebRequest like #BiNZGi suggested, rather than the WebClient class - see the "short answer" above for the code.
Also, note that this sniffing behaviour with the User-Agent is specific to "thegivebackproject.org" website and isn't necessarily true for other websites - you don't always need to include a User-Agent header as a rule of thumb.
You can use the easier Invoke-WebRequest:
Invoke-WebRequest -URI https://thegivebackproject.org/CheckStatus.txt -OutFile D:\CheckStatus.txt
Related
I am quite new to API testing and any topics in networking in general.
I'm attempting to retrieve BC, Canada school names and rankings from a website. The target data is in the right table, available after a prompt to choose a province (here I selected British Columbia). I am using the chrome developer tools to analyze the requests/responses after selecting the province, however I am getting gibberish when expecting a JSON response.
After choosing the province, 3 XMLHttpRequests are made to compareschoolrankings.org/api/v1/ with response headers of
content-type: application/json
I assume the responses to these requests hold my target data, however the response content is gibberish when I would otherwise expect it in json format, in example:
3dd3U2FsdGVkX1/TDJgJ2Kpx3ekEf3yT9DaZMp8nDRMZJlP85M8RWOruj5tm1Qu6c2UF1ifJVFMU8+XbeXvIbWZ/Or ... bo/XkaOUHOnWGMhpFIC8mYz
Here is one request (it's header) that I expect is requesting the target data:
:authority: www.compareschoolrankings.org
:method: GET
:path: /api/v1/schools.json?province=bc&ht=NzQ1Mjg
:scheme: https
accept: application/json, text/plain, */*
accept-encoding: gzip, deflate, br
accept-language: en-US,en;q=0.9
cookie: _ga=GA1.2.217437611.1609538525; _gid=GA1.2.1976080400.1609538525; _gat_UA-3850680-10=1; _hjTLDTest=1; _hjid=1d1327a2-dd14-4388-b08f-ef670f6178cf; _hjFirstSeen=1
referer: https://www.compareschoolrankings.org/
sec-fetch-dest: empty
sec-fetch-mode: cors
sec-fetch-site: same-origin
user-agent: Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/87.0.4280.66 Safari/537.36
x-requested-with: XMLHttpRequest
Here is the corresponding response header:
accept-ranges: bytes
age: 14440
cache-control: max-age=21600, public
content-encoding: gzip
content-language: en
content-length: 448428
content-type: application/json
date: Fri, 01 Jan 2021 22:02:13 GMT
etag: W/"1609524093"
expires: Sun, 19 Nov 1978 05:00:00 GMT
last-modified: Fri, 01 Jan 2021 18:01:33 GMT
server: nginx
strict-transport-security: max-age=300
vary: Accept-Encoding, Cookie
via: 1.1 varnish, 1.1 varnish, 1.1 varnish
x-cache: MISS, HIT, HIT
x-cache-hits: 0, 1, 1
x-content-type-options: nosniff
x-drupal-cache: MISS
x-frame-options: SAMEORIGIN
x-generator: Drupal 8 (https://www.drupal.org)
x-pantheon-styx-hostname: styx-fe3fe4-h-6c4765d776-86gmx
x-served-by: cache-yyz4534-YYZ, cache-sea4433-SEA, cache-sea4483-SEA
x-styx-req-id: 623d1ec1-4c5b-11eb-bbc4-620e110c7f7f
x-timer: S1609538534.794192,VS0,VE0
x-ua-compatible: IE=edge
Question: Why is the response not in a JSON format when both the request and response headers indicate the content to be so? Where should I be looking to retreive my target data?
Any help or references would be much appreciated!
My project setup is based on Angular generated from CLI and I'm facing issues with displaying local german characters.
My meta in index.html:
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
My editor VS Code is set to save files in UTF-8:
"[html]": {
"editor.detectIndentation": true,
"editor.insertSpaces": false,
"files.encoding": "utf8",
},
"[typescript]": {
"files.encoding": "utf8",
},
"files.encoding": "utf8",
My document request headers:
GET /home?schedule=2 HTTP/1.1
Host: localhost:4200
Connection: keep-alive
Pragma: no-cache
Cache-Control: no-cache
Upgrade-Insecure-Requests: 1
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/68.0.3440.106 Safari/537.36
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8
Referer: http://localhost:4200/home?schedule=2
Accept-Encoding: gzip, deflate, br
Accept-Language: pl-PL,pl;q=0.9,en-US;q=0.8,en;q=0.7
Cookie: utilityType=HEAT
And respective response headers:
HTTP/1.1 200 OK
X-Powered-By: Express
Access-Control-Allow-Origin: *
Accept-Ranges: bytes
Content-Type: text/html; charset=UTF-8
Content-Length: 898
ETag: W/"382-xsHtQLNHNB+FVVszGe2IsoXNXEI"
Date: Fri, 31 Aug 2018 09:05:10 GMT
Connection: keep-alive
I'm using Google Fonts Open Sans with latin-extended set:
<link href="https://fonts.googleapis.com/css?family=Open+Sans&subset=latin-ext" rel="stylesheet">
Any outputting results still looks like below:
I'm running out of ideas what else may affect this?
Ok, as already mentioned by #n.m. , issue was that localisation files were produced by external translation team, which saved all text feeds as ANSI, not as UTF-8 encoding. Converting files from ANSI to UTF resolved issue.
I have a small site where I have a mailing list contact form in an iFrame, and once its submitted, a callback page I registered with the mailing list service is called, displaying in the iFrame and asking the user to check their email. The page I registered is http://mydomain.com/verify.html. In vertify.html I use "window.parent.document.getElementById('lightbox4').style.display='none';" to close the lightbox div that contains the I frame. This all works well, as long as the user initially visits http://mydomain.com, but if they visit http://www.mydomain.com, then calling "window.parent.document.getElementById('lightbox4').style.display='none';" doesn't work, because its a cross domain request.
So, no problem I thought, I'll just create a redirect rule to convert calls from www.mydomain.com, to mydomain.com. But now I'm getting the error "This webpage has a redirect loop" when I try to go to either www.mydomain.com or mydomain.com. In IIS7, I have two bindings, one for mydomain.com and one for www.mydomain.com. My DNS zone has an A record for mydomain.com, and a CNAME for www.mydomain.com.
Am I doing something stupid here? Is there ome way to debug this? I can see in Firefox, using the Live HTTP headers plugin, the URL is redirected properly from www.mydomain.com to mydomain.com , but then tries to keep trying to redirect mydomain.com to mydomain.com, creating the endless loop:
http://www.mydomain.com/
GET / HTTP/1.1
Host: www.mydomain.com
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.8; rv:20.0) Gecko/20100101 Firefox/20.0
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Accept-Language: en-US,en;q=0.5
Accept-Encoding: gzip, deflate
Connection: keep-alive
HTTP/1.1 302 Redirect
Content-Type: text/html; charset=UTF-8
Location: http://mydomain.com/
Server: Microsoft-IIS/7.0
X-Powered-By: ASP.NET
Date: Sun, 21 Apr 2013 15:20:12 GMT
Content-Length: 150
----------------------------------------------------------
http://mydomain.com/
GET / HTTP/1.1
Host: mydomain.com
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.8; rv:20.0) Gecko/20100101 Firefox/20.0
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Accept-Language: en-US,en;q=0.5
Accept-Encoding: gzip, deflate
Connection: keep-alive
HTTP/1.1 302 Redirect
Content-Type: text/html; charset=UTF-8
Location: http://mydomain.com/
Server: Microsoft-IIS/7.0
X-Powered-By: ASP.NET
Date: Sun, 21 Apr 2013 15:20:12 GMT
Content-Length: 150
----------------------------------------------------------
http://mydomain.com/
GET / HTTP/1.1
Host: mydomain.com
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.8; rv:20.0) Gecko/20100101 Firefox/20.0
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Accept-Language: en-US,en;q=0.5
Accept-Encoding: gzip, deflate
Connection: keep-alive
HTTP/1.1 302 Redirect
Content-Type: text/html; charset=UTF-8
Location: http://mydomain.com/
Server: Microsoft-IIS/7.0
X-Powered-By: ASP.NET
Date: Sun, 21 Apr 2013 15:20:12 GMT
Content-Length: 150
----------------------------------------------------------
and it keeps going until "This webpage has a redirect loop" is displayed
I expect I have to create a new virtual directory for www.mydomain.com and then redirect that to mydomain.com, but that seems awkward.
I am using rails 4.0 to develop facebook page_tab. I got blank content showed on the facebook tabpage.
From what I think, the issue is related to turbolink. The following are the firefox requrest and response headers
Response header
HTTP/1.1 200 OK
Date: Mon, 01 Apr 2013 08:54:54 GMT
Status: 200 OK
Connection: close
X-Frame-Options: SAMEORIGIN
X-XSS-Protection: 1; mode=block
X-Content-Type-Options: nosniff
X-UA-Compatible: chrome=1
X-XHR-Current-Location: /page_tab
Content-Type: text/html; charset=utf-8
Etag: "5d34060006e527f1a21db545df3d919f"
Cache-Control: max-age=0, private, must-revalidate
Set-Cookie: _likenotlike_session=SEhKbk5oZ0FHT2o0RkRMK3k2OThidHY1Yk5HYjdIWGNkNFIrWisxbkVKRitLT2tJM2d2b1NVV0xQYW5Qc015L0ljVjdDWCtITWR4cUhLc2VjK3hGUHNCbHAzb0YxV1F4OUNaa0hudDE0MkFZRlhYUGgxK2M5eDBNMTRIZzdhZXVyRTBmZEx3Q1RKaXRrZFJwaUYyY2JMdUNpSmlZRmhNS0Z6dGFEMEE5b2RLOXJGdWF0Z1NHcDR1N0ZleVgvZDRJLS1KcjhndzRuUjJaSXZnd1lNdjUyNTJBPT0%3D--a51e845979d81ace643d14b399ffa655ece63d79; path=/; HttpOnly
X-Request-Id: aac0e275-92b7-4b4b-9be7-b811ff9dec29
X-Runtime: 0.024202
Request Header
POST /page_tab HTTP/1.1
Host: localhost:60000
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.6; rv:20.0) Gecko/20100101 Firefox/20.0
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Accept-Language: en-US,en;q=0.5
Accept-Encoding: gzip, deflate
Referer: http://static.ak.facebook.com/platform/page_proxy.php?v=5
Cookie: fbm_353759128067702=base_domain=.localhost; fbm_470420673030979=base_domain=.localhost; request_method=POST; _likenotlike_session=T2o2dVZUSkhxUDhWdDJyWGsvQmYxZHVGVGszYy9pc2VIdGs3OWJ0YkRQTSt2eTJtR2pxTDZLSFRpbWVDamx2ZFVxU2pJRENNRzl2elNqMkF4Q01hcTlWZkZNNUVnSy9ucnJrUWQ0YWFheUJqRklsaEQ1RlM5ZGN1MEhGV0NpQ0E5bjc0VXZoQThuVzJjbjFQTmpZeUVzK2M1anRBamZqU3VwZVlYUlNpQmRnYnlVNWJZTk5wc3dZTEZpR0lyWTE2LS1tSkRHb3JpNGM4U205bEdxMEpkOE5nPT0%3D--85ea3314a43d08dda9d00218a5045968ef040d0b
Connection: keep-alive
In the response header there are X-- headers that I think are related to ajax. So I think rails together with turbolink think that the request is the ajax request but actually the request is normal post request if you can see from the request header above.
Really appreciate for your help.
Solution to the problem is the following link
http://conpanna.net/en-us/blog/5185b5ce79ec73ae54000003
Just add response.headers["X-Frame-Options"] = "GOFORIT"
and every thing works
I have three websites hosted (example1.com, example2.com, example3.com) on a server. There is a page (test.php) on example1.com with just code below inside it:
<?php
header('Location:http://example2.com/a.php');
?>
When I browse test.php it goes to http://example1.com/a.php . it doesn't understand it is another domain url, it tried to find the page on itself.
but when I put http://google.com instead of example2.com/a.php it works correct. I really get confused.
What is the problem ? Should I set some configuration on the server?
( I am administrator of the hosting server ).
Ps. The server is behind a pound server.
Edited:
Here's the Firebug Net output for example1.com/test.php
Response Headers:
HTTP/1.1 302 Found
Date: Tue, 09 Oct 2012 09:03:34 GMT
Server: Apache/2.2.16 (Debian)
Location: http://example1.com/a.php
Vary: Accept-Encoding
Content-Encoding: gzip
Content-Length: 21
Keep-Alive: timeout=15, max=100
Connection: Keep-Alive
Content-Type: text/html; charset=utf-8
Request Headers:
Accept text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Accept-Encoding gzip, deflate
Accept-Language en-us,en;q=0.5
Connection keep-alive
Cookie mycookie
Host example1.com
User-Agent Mozilla/5.0 (X11; Linux i686; rv:14.0) Gecko/20100101 Firefox/14.0.1
the problem is solved. it was because of pound server configuration. 'RewriteLocation' entry in pound server configuration must be set to 2 to this server doesn't change the redirect location.
anyway, thank you for answering.