URLConnection is returning text/html content type - httpurlconnection

I am trying to get the inputstream of online pdf file but the procedure not working. URLConnection is returning the content type of url as text/html instead of application/pdf. As you can see https://www.dropbox.com/s/ao3up7xudju4qm0/Amalgabond%20Adhesive%20Agent.pdf url is pdf.
I am using the following code for URLConnection and getting Content Type
URL fileUrl;
try {
String str = "https://www.dropbox.com/s/ao3up7xudju4qm0/Amalgabond%20Adhesive%20Agent.pdf"
fileUrl = new URL(str);
URLConnection connection = fileUrl.openConnection();
Log.i("mustang", "Content-type: " + connection.getContentType());
InputStream is = fileUrl.openStream();
Log.i("mustang", "is.available(): " + is.available());
Due to this, I am unable to parse the buffer. Why I am getting text/html content type?
Thanks,

Dropbox uses user-agent sniffing to determine if it should display a lightbox (preview of the PDF). What you're seeing is the lightbox code (if you printed the content you would be able to tell this).
You need to add a line specifying a non-interactive user-agent such as wget by adding a line such as:
URLConnection connection = fileUrl.openConnection();
connection.setRequestProperty("User-Agent", "Wget/5.0");
This generally overrides dropbox's smart content preview code.

Related

How to send arabic word with a http.Mutli part form data in flutter

I'm using the http package in order to send multi part - form data request to the server, it works great and the server are able to parse the files and fields, but I'm getting problem when trying to send with the request a field that contains arabic word then this field is parsed in the server as it is file.The server is unable to know that this field that contains arabic word is a field. I print the file in the server that contains the arabic word and it shows it's mimetype as 'text/plain; utf=8' but it still put it with the files. I don't know why even though I'm putting in the header
..headers['Content-Type'] = "multipart/form-data; charset=UTF-8"
var request = http.MultipartRequest('POST', postUri)
..fields['username'] = _username!
..fields['password'] = _password!
..fields['address'] = 'تشيتتشسيتش يسشتيرشتي' // Here is the problem
..headers['Content-Type'] = "multipart/form-data; charset=UTF-8"
..files.add(await http.MultipartFile.fromPath(
'coverPic',
_coverImage!.path,
contentType: MediaType('image', coverPicType),
));

Jersey Client - send gzip file got error: The magic number in GZip header is not correct. Make sure you are passing in a GZip stream

I have to call a webservice and it requires to set header("x-ms-blob-type","blockblob") and put a file in gzip. I am trying to use the FormDataMultiple to set the file and send it as an Entity. This is what I did.
The server responded the they got the file but when they tried to process the file it gave me back with this error:
"Error parsing file: The magic number in GZip header is not correct. Make sure you are passing in a GZip stream.”
final FileDataBodyPart filePart = new FileDataBodyPart("file", new File("/Users/Downloads/bulk.gz"),MediaType.APPLICATION_OCTET_STREAM_TYPE);
FormDataMultiPart formDataMultiPart = new FormDataMultiPart();
final FormDataMultiPart multipart = (FormDataMultiPart) formDataMultiPart.bodyPart(filePart);
final Response response = target.request()
.header("Content-Type", "application/x-gzip")
.header("x-ms-blob-type","blockblob")
.put(Entity.entity(multipart,multipart.getMediaType()));
What should I do? I think it would be better if I can pass a gzip input stream to the request instead of using the Multiple Part Form.
Best Regards,
Robert

Need to find the requests equivalent of openurl() from urllib2

I am currently trying to modify a script to use the requests library instead of the urllib2 library. I haven't really used it before and I am looking to do the equivalent of urlopen("http://www.example.org").read(), so I tried the requests.get("http://www.example.org").text function.
This works fine with normal everyday html, however when I fetch from this url (https://gtfsrt.api.translink.com.au/Feed/SEQ) it doesn't seem to work.
So I wrote the below code to print out the responses from the same url using both the requests and urllib2 libraries.
import urllib2
import requests
#urllib2 request
request = urllib2.Request("https://gtfsrt.api.translink.com.au/Feed/SEQ")
result = urllib2.urlopen(request)
#requests request
result2 = requests.get("https://gtfsrt.api.translink.com.au/Feed/SEQ")
print result2.encoding
#urllib2 write to text
open("Output.txt", 'w').close()
text_file = open("Output.txt", "w")
text_file.write(result.read())
text_file.close()
open("Output2.txt", 'w').close()
text_file = open("Output2.txt", "w")
text_file.write(result2.text)
text_file.close()
The openurl().read() works fine but the requests.get().text doesn't work for the given this url. I suspect it has something to do with encoding, but i don't know what. Any thoughts?
Note: The supplied url is a feed in the google protocol buffer format, once I receive the message i give the feed to a google library that interprets it.
Your issue is that you're making the requests module interpret binary content in a response as text.
A response from the requests library has two main way to access the body of the response:
Response.content - will return the response body as a bytestring
Response.text - will decode the response body as text and return unicode
Since protocol buffers are a binary format, you should use result2.content in your code instead of result2.text.
Response.content will return the body of the response as-is, in bytes. For binary content this is exactly what you want. For text content that contains non-ASCII characters this means the content must have been encoded by the server into a bytestring using a particular encoding that is indicated by either a HTTP header or a <meta charset="..." /> tag. In order to make sense of those bytes they therefore need to be decoded after receiving using that charset.
Response.text now is a convenience method that does exactly this for you. It assumes the response body is text, and looks at the response headers to find the encoding, and decodes it for you, returning unicode.
But if your response doesn't contain text, this is the wrong method to use. Binary content doesn't contain characters, because it's not text, so the whole concept of character encoding does not make any sense for binary content - it's only applicable to text composed of characters. (That's also why you're seeing response.encoding == None - it's just bytes, there is no character encoding involved).
See Response Content and Binary Response Content in the requests documentation for more details.

Send a special parameter via RequestBuilder POST on GWT

I am a beginner of GWT.
In my application, i need to post parameter which of value is a URL such like a following string.
'http://h.com/a.php?code=186&cate_code=MV&album=acce'
As you can see it, it includes character sequences '&cat_code='.
As i know, &parametername=value is form of one parameter!...
Because of this, a PHP file on my server side, only receives a following string,
'http://hyangmusic.com/Ticket_View.php?code=186'
How could i do in this situation... i want to receive a full URL as parameter on the server side PHP.
Please help me.
Thanks in advance.
my code.
String name = "John";
String url = "http://h.com/a.php?code=186&cate_code=MV&album=acce";
String parameter = "name="+name+"&url="+url;
builder.setHeader("Content-Type","application/x-www-form-urlencoded");
builder.sendRequest(parameter,
new RequestCallback() {
}
Use URL.encodeQueryString(url) so that your & is turned into a %26 (26 being the hexadecimal representation of the UTF-8 encoding of &)

Encoding not present in HTTP header, how to find it in HTML header? (iPhone)

I'm writing a browser for the iPhone.
I'm using
NSString* storyHTML = #"";
ASIHTTPRequest *request = [ASIHTTPRequest requestWithURL:url];
[request startSynchronous];
to download HTML. The problem is sometimes there is no encoding in the HTTP header, in which case the above code defaults to Latin-ISO.
In this case I can read up to the header in the HTML and find the meta tag that specifies the actual encoding. Which looks something like this:
<meta http-equiv="content-type" content="application/xhtml+xml; charset=UTF-8" />
The problem is there are a TON of possible encodings that can be found in the meta tag as seen here: http://www.iana.org/assignments/character-sets
I would need to some how convert one of those encoding strings into one of the constant encodings found in the NSString class:
enum {
NSASCIIStringEncoding = 1,
NSNEXTSTEPStringEncoding = 2,
NSJapaneseEUCStringEncoding = 3,
NSUTF8StringEncoding = 4,
NSISOLatin1StringEncoding = 5, ...
There must be a class that some how determines the encoding of HTML for you. Is there a way to look into UIWebView and see how they do it?
It seems like downloading HTML should be easy, what am I missing?
Thanks!
Just going to round-up my comments and add a few final words of advice into an answer.
Comment 1:
From general usage, you can use the ASIHTTPRequest -responseString, otherwise you can use the data itself and use your own logic to figure out what type of encoding (UTF8, UTF16, etc)
Comment 2:
From the ASIHTTP website:
ASIHTTPRequest will attempt to read the text encoding of the received data from the Content-Type header. If it finds a text encoding, it will set responseEncoding to the appropriate NSStringEncoding. If it does not find a text encoding in the header, it will use the value of defaultResponseEncoding (this defaults to NSISOLatin1StringEncoding). > When you call [request responseString], ASIHTTPRequest will attempt to create a string from the data it received, using responseEncoding as the source encoding.
Comment 3
See also: Encoding issue with ASIHttpRequest
I would personally recommend taking the response data and just assuming the content can fit into UTF16 (or 8). Of course you could also use a regular-expression or HTML parser to grab the <meta> tag inside the <head> element, but if the response is in a weird content-type then you might not be able to find the string #"<head"
I would also use curl from the CLI on your computer to see what content-types ASIHTTPRequest is fetching. If you run a command like
curl -I "http://www.google.com/"
You'll get the following response:
HTTP/1.1 200 OK
Date: Tue, 09 Aug 2011 20:05:00 GMT
Expires: -1
Cache-Control: private, max-age=0
Content-Type: text/html; charset=ISO-8859-1
It would appear almost all sites respond correctly with this header, and when they don't I think using UTF8 would be a great bet. Could you comment with the link of the site that was giving you the issue?
Is there a way to look into UIWebView and see how they do it?
There is. UIWebView is a wrapper around WebKit, which is an open source project. You can check out the source code or browse it online.