python urllib.request not getting same html as my browser - redirect

Trying to get html code of http://groupon.cl/descuentos/santiago-centro with the following python code:
import urllib.request
url="http://groupon.cl/descuentos/santiago-centro"
request = urllib.request.Request(url, headers = {'user-agent' : 'Mozilla/4.0 (compatible; MSIE 5.5; Windows NT)'})
response = urllib.request.urlopen(request)
return response.read().decode('utf-8')
I'm getting html code for a page which asks for my location. If I manually open the same link with my browser (having no cookies involved, even with a recently installed browser) I go directly to a page with discount promotions. It seems to be some redirect action that is not taken place for urllib. I am using the user-agent header to try to get the behaviour for a typical browser, but I have no luck.
How could I get the same html code as with my browser?

I think you can run this command:
wget -d http://groupon.cl/descuentos/santiago-centro
and you will see the wget print two http request and save the response page to a file.
- HTTP/1.1 302 Moved Temporarily
- HTTP/1.1 200 OK
and the content of the file was html code of you want.
The first response code is 302, so urllib.requst.urlopen do a second request. But it dit not
set the correct cookie which get from the first response, the server cannot undstand the
second request, so you get another page.
The http.client module does not handle the 301 or 302 http reponse by himself.
import http
conn = http.client.HTTPConnection("groupon.cl")
#do first request
conn.request("GET", "/descuentos/santiago-centro")
print(conn.status) # 301 or 302
print(conn.getheaders()) # set-Cookie
#get the cookie
headers = ....
#do second request
conn.requesst("GET", "/", headers)
......
......
#Get response page.

Related

301 moved permanently with socket.http

In python (and my browser), I am able to send a request to https://www.devrant.com/api/devrant/rants?app=3&sort=algo&limit=10&skip=0 and get a response, as expected, but with Lua, I get HTTP/1.1 301 Moved Permanently. Here is what I have tried so far:
http = require("socket.http");
print(http.request("https://www.devrant.com/api/devrant/rants?app=3&sort=algo&limit=10&skip=0")
which outputs an HTTP error page (moved permanently) and
301 table: 0x8f32470 http/1.1 301 Moved Permanently
the table's contents are:
location https://www.devrant.com/api/devrant/rants?app=3&sort=algo&limit=10&skip=0
content-type text/html
server nginx/1.10.0 (Ubuntu)
content-length 194
connection close
date Mon, 11 Dec 2017 01:41:35
Why does only Lua get this error? If I request to google, I get the google home page HTML. If I request to status.mojang.com, I get the mojang server statuses in a JSON response string, so the socket is functional for certain.
It's because you are using socket.http to request a page from https URL; since socket.http doesn't handle https, it sends the request to port 80, which gets forwarded to https URL, but socket library doesn't follow that redirect, as it doesn't "know" what to do with https, so it simply reports 301.
You need to install and use luasec and use ssl.https instead of socket.http, which will make it work.

Uber API issue with CORS

First time asking a question here. I'm a beginner at this, but i'm truly stumped at the problem i'm facing.
Browsers in use:
Safari and Firefox (both on Mac OS Sierra)
Firefox (Linux - Ubuntu 16.04.2)
I am registered as an Uber Developer and have registered an App in the Dashboard. I'm only using the Server Token for authentication at the moment. In the Dashboard, I have set the following entries in the "Authorizations" tab of the App for CORS (Optional URI for CORS Support):
http://localhost:8000 <-- web server in my PC
https://subdomain.mydomain.com <--- remote web server
A few months ago i created a web app using HTML, CSS and JS (with Jquery v2.2.4) to play around with the Ride Estimates API and was able to get it to report data for many locations in my area successfully. Somehow it no longer works. I'm trying to fix that and improve the functionality. However, i just can't get past the initial query to the API because of CORS issues that were not existent before.
My API URL is:
https://api.uber.com/v1/estimates/price?start_latitude=8.969145&start_longitude=-79.5177675&end_latitude=8.984104&end_longitude=-79.517467&server_token={*********SERVER*TOKEN**********}
When i paste that in the address bar of the browser i get valid JSON:
{"prices":[{"localized_display_name":"uberX","distance":1.58,"display_name":"uberX","product_id":"811c3224-5554-4d29-98ae-c4366882011f","high_estimate":3,"surge_multiplier":1.0,"minimum":2,"low_estimate":2,"duration":420,"estimate":"2-3\u00a0$","currency_code":"USD"},{"localized_display_name":"X English","distance":1.58,"display_name":"X English","product_id":"8fe2c122-a4f0-43cc-97e0-ca5ef8b57fbc","high_estimate":4,"surge_multiplier":1.0,"minimum":3,"low_estimate":3,"duration":420,"estimate":"3-4\u00a0$","currency_code":"USD"},{"localized_display_name":"uberXL","distance":1.58,"display_name":"uberXL","product_id":"eb454d82-dcef-4d56-97ca-04cb11844ff2","high_estimate":4,"surge_multiplier":1.0,"minimum":3,"low_estimate":3,"duration":420,"estimate":"3-4\u00a0$","currency_code":"USD"},{"localized_display_name":"Uber Black","distance":1.58,"display_name":"Uber Black","product_id":"ba49000c-3b04-4f54-8d50-f7ae0e20e867","high_estimate":6,"surge_multiplier":1.0,"minimum":4,"low_estimate":4,"duration":420,"estimate":"4-6\u00a0$","currency_code":"USD"},{"localized_display_name":"Uber SUV","distance":1.58,"display_name":"Uber SUV","product_id":"65aaf0c2-655a-437d-bf72-5d935cf95ec9","high_estimate":7,"surge_multiplier":1.0,"minimum":5,"low_estimate":5,"duration":420,"estimate":"5-7\u00a0$","currency_code":"USD"}]}
I then proceed to set up JS (w/ JQuery) code in webpage...
var url = "https://api.uber.com/v1/estimates/price?start_latitude=8.969145&start_longitude=-79.5177675&end_latitude=8.984104&end_longitude=-79.517467&server_token={*********SERVER*TOKEN**********}";
$.getJSON(url, function(result){
console.log(result);
});
Uploading the HTML and JS to my remote web server and then loading the webpage in any of my browsers yields a 200 status from Uber API. However, the console log shows CORS blocking my request (PROBLEM #1):
Cross-Origin Request Blocked: The Same Origin Policy disallows reading the remote resource at https://api.uber.com/v1/estimates/price?start_latitude=8.969145&start_longitude=-79.5177675&end_latitude=8.984104&end_longitude=-79.517467&server_token={*********SERVER*TOKEN**********}. (Reason: CORS header ‘Access-Control-Allow-Origin’ missing).
Then, in the Inspector view of both Mac Browsers, under the Network / Resources areas, i see the 200 Status message from the GET request. However, along with the Response message (PROBLEM #2):
SyntaxError: JSON.parse: unexpected end of data at line 1 column 1 of the JSON data
The Request Headers are:
GET /v1/estimates/price?start_latitude=8.969145&start_longitude=-79.5177675&end_latitude=8.984104&end_longitude=-79.517467&server_token={*********SERVER*TOKEN**********} HTTP/1.1
Host: api.uber.com
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.12; rv:52.0) Gecko/20100101 Firefox/52.0
Accept: application/json, text/javascript, */*; q=0.01
Accept-Language: en-US,en;q=0.5
Accept-Encoding: gzip, deflate, br
Referer: https://subdomain.domain.com/Uber/index.html
Origin: https://subdomain.domain.com
Connection: keep-alive
The Response Headers are:
HTTP/1.1 200 OK
Server: nginx
Date: Sun, 19 Mar 2017 22:26:31 GMT
Content-Type: application/json
Transfer-Encoding: chunked
Connection: keep-alive
Content-Geo-System: wgs-84
Content-Language: en
X-Rate-Limit-Limit: 2000
X-Rate-Limit-Remaining: 1998
X-Rate-Limit-Reset: 1489964400
X-Uber-App: uberex-nonsandbox, optimus, migrator-uberex-optimus
Strict-Transport-Security: max-age=604800
X-Content-Type-Options: nosniff
X-XSS-Protection: 1; mode=block
Content-Encoding: gzip
In Firefox for Linux i sometimes don't get the Syntax Error; i always seem to get it on the Mac Browsers. In Linux, when i do get that error, then clicking on the "Edit and Resend" Headers button (resending the Headers but without really editing the Headers), the Syntax Error disappears and the Response text actually shows the Uber API Object that is supposed to be there... but i still get the CORS Blocked message on the Console Log. I really don't understand why this is, but it seems contradictory. In the end, i am unable to get to use the API data that, using the same method months ago, i could get for several dozens of locations.
I have looked for answers in similar questions but so far have found none that apply to my case. Any help will be greatly appreciated. Getting really frustrated... really stuck here.
This issue was caused by the API not including the header correctly. This issue is resolved and the api is now working as expected. Also, the allow origin header will only be returned in a response if an origin is specified in the request.

Jsoup redirect has not the same behaviour than browser redirect

I'm facing a server through Jsoup (latest v 1.10.2) to extract some data from a page.
This server is opened to anonymous users but it uses a redirect chain to grant a session ID to each user.
This is the sequence, I got by browser:
First request to http://SERVER_HOST/page
resp: 302 Redirect to Location http://SSO_SERVER
Follow redirect, opening http://SSO_SERVER
resp: 302 Redirect to Location http://SERVER_HOST/page?sessionID=123456
Follow redirect, opening http://SERVER_HOST/page?sessionID=123456
resp: 200 :)
Unexpectedly with Jsoup, the redirect chain fails. Look at the difference into the step 2:
First request to http://SERVER_HOST/page (without cookies)
resp: 302 Redirect to Location http://SSO_SERVER
Follow redirect, opening http://SSO_SERVER
resp: 302 Redirect to Location /shared/SSO/http%3a%2f%2SERVER_HOST/page%3dsessionID=123456
Follow redirect, opening http://SSO_SERVER/shared/SSO/http%3a%2f%2SERVER_HOST/page%3dsessionID=123456
resp: 400 not found :(
At the step 2, redirect location in server response start with "/" not with "http://", so at the step 3 it connects to the wrong host.
Why at the step 2, I got a different location in server response according the request belongs to browser or to JSoup?
I set JSoup request the same headers of browser request:
Response response = Jsoup.connect(link)
.userAgent("Mozilla/5.0 (Windows NT 6.3; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/55.0.2883.87 Safari/537.36")
.header("Accept","text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8")
.header("Accept-Encoding", "gzip, deflate, sdch")
.header("Accept-Language", "it,en-US;q=0.8,en;q=0.6")
.header("Upgrade-Insecure-Requests", "1")
.method(Method.GET)
.followRedirects(true).execute();
There was a bug in Jsoup.Connect() in how a query string in a redirect header was handled.
That's fixed now in this commit. You can build off HEAD to get access to the fix, and it will be available in the next release (1.10.3).

Failing to use eZ Platform JS REST API client with cross domain

Context and troubles
I'm currently building a web application on top of ezPlatform & Symfony.
My goal is now to request this app from an external website using a JS client (using the JS REST client of ezplatform : CAPI.js)
I tested my script localy (on the app itself = same domain) and everything is fine : I can GET and POST data.
But testing this script on an external webiste (CORS requests) didn't work. I am stuck with 2 differents problems :
Server side : The response headers do not contains the Access-Control-Allowed-Methods
Client side : No session cookies are given within a request
Details
Problem 1 : No header "allow_methods"
On chrome I always have this error :
XMLHttpRequest cannot load http://api.ezplatform.lan/api/ezp/v2/user/sessions. Response for preflight has invalid HTTP status code 405
Note that, on the server side, nelmio_cors bundle is used to configures the headers. The configs :
nelmio_cors:
paths:
'^/api/ezp/v2/':
max_age: 3600
allow_credentials: true
allow_origin: ['*']
allow_methods: ['POST', 'PUT', 'GET', 'DELETE', 'OPTIONS']
expose_headers: []
And now, here are the details of a failing preflight request :
GENERAL
Request URL:http://api.ezplatform.lan/api/ezp/v2/user/sessions
Request Method:OPTIONS
Status Code:405 Method Not Allowed
Remote Address:192.168.1.82:80
REPONSE HEADERS
Access-Control-Allow-Credentials:true
Access-Control-Allow-Headers:authorization, accept, content-type, x-csrf-token, destination, x-siteaccess
Access-Control-Allow-Origin:http://www.externalsite.lan
Access-Control-Max-Age:3600
Cache-Control:private
Connection:Keep-Alive
Content-Length:0
Content-Type:text/html; charset=UTF-8
Date:Tue, 13 Dec 2016 15:24:44 GMT
Keep-Alive:timeout=5, max=99
Server:Apache/2.4.23 (Ubuntu)
Vary:X-User-Hash
REQUEST HEADERS
Accept:*/*
Accept-Encoding:gzip, deflate, sdch
Accept-Language:fr-FR,fr;q=0.8,en-US;q=0.6,en;q=0.4
Access-Control-Request-Headers:content-type
Access-Control-Request-Method:POST
Cache-Control:no-cache
Connection:keep-alive
Host:api.ezplatform.lan
Origin:http://www.externalsite.lan
Pragma:no-cache
Referer:http://www.externalsite.lan/
User-Agent:Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/55.0.2883.87 Safari/537.36
In the "response headers", there is no Access-Control-Allow-Methods despite of the nelmio_cors configs.
Digging inside the nelmio_cors code I realized that the "allow_methods" config is retrieved but is then overwritten by something else, and here it becomes obscur to me.
This old ezpublish bug maybe give me a clue about the situation : the ezPublishRestBundle does not seems to find any "allowed method", and somehow overwrite the nelmio_cors config.
In the Resprovider.php file, if I force the methods getAllowedMethods to return this :
return ["POST", "PUT", "GET", "DELETE", "OPTIONS"]; then I have no more 405 error but a very different problem (explained right after this)
Problem 2 : no session cookies allowed
With my previous wonderful hack, I can now get a little bit further : my request are allowed but some of them are still failing.
I noticed that no session cookie is passed inside the requests (which is the normal behaviour testing on the same domain).
This times it seems to come from the CAPI.js file : the XmlHttpRequest object never has the property withCredentials to true.
If I add this code XHR.withCredentials = true; in CAPI.js before the request is send, then it seems to be fine.
Conclusion
I really wonder if the ez Platform rest client has been designed to be used for cross domain, but it would be very surprising if not.
So I must do something wrong, and if someone can explain me what, I would be extemely grateful :)

Booted Off Local Server - 302 error

I'll start with the log that I am receiving below:
Dec.15.11.56-Rf: Incoming Request URL: /
Dec.15.11.56-Rf: SECURE GET Path: / From: mlocal.cldeals.com Rewritten: www.cldeals.com
Dec.15.11.56-Rf: Received 302 Found [text/html; charset=UTF-8] response for /
Dec.15.11.56-Rf: Sending 302 text/html; charset=UTF-8 response for /
Dec.15.11.56-Rf: Stats. Total: 0.52088702, Upstream: 0.48212701, Processing: 0.00105600, ProcessingOther: 0.04037500
Basically, when I go to mlocal.cldeals.com, it loads fine. If I click on another page, say mlocal.cldeals.com/products, that loads fine as well. The issue seems to be when I go to the account page and try to switch back to the homepage, maybe some type of security issue? When I try to switch back to mlocal.cldeals.com, the home page, it boots me off and sends me to www.cldeals.com. Is there something I can add to force this from not happening? Additionally, is this just a local server issue that would go away when I launch it on Moovweb's server? Any help is greatly appreciated.
Thank you.
It looks like the backend response to https://www.cldeals.com is a 302 to http://www.cldeals.com:80/. Not sure why that is the case (see note below *)
curl -v -o /dev/null https://www.cldeals.com
This response contains a hardcoded Location header and your project is passing along the response as is, which is why you are being booted off your local server.
Because the Location header value has a port specified, you'll need to modify your config.json to include this line in the mapping:
{
"host_map": [
"$.cldeals.com => www.cldeals.com",
"$.cldeals.com => www.cldeals.com:80"
]
}
This way, the SDK knows to rewrite that specific host:port value... (By default all HTTP requests go through port 80, so that information isn't really necessary)
*This is might be bug in the backend implementation because once you log in, you should be in HTTPS mode until you log out. (I can see some pages with personal information being transmitted over plain HTTP)