My recent assignment is to make a proxy in C using socket programming. The proxy only needs to be built using HTTP/1.0. After several hours of work, I have made a proxy that can be used with Chromium. Various websites can be loaded such as google and several .edu websites; however, many websites give me a 404 error for page not found (these links work fine when not going through my proxy). These 404 errors even occur on the root address "/" of a site... which doesn't make sense.
Could this be a problem with my HTTP request? The HTTP request sent from the browser is parsed for the HTTP request method, hostname, and port. For example, if a GET request is parsed from the browser, a TCP connection is established to the hostname and port provided, and the HTTP GET request is sent in the following format:
GET /path/name/item.html HTTP/1.0\r\n\r\n
This format works for a small amount of websites, but a 404 error message is created for the rest. Could this be the problem? If not, what else could possibly be giving me this problem?
Any help would be greatly appreciated.
One likely explanation is the fact that you've designed a HTTP/1.0 proxy, whereas any website on a shared hosting site will only work with HTTP/1.1 these days (well, not quite, but I'll get to that in a second).
This isn't the only possible problem by a long way, but you'll have to give an example of a website which is failing like this to get some more ideas.
You seem to understand the basics of HTTP, that the client makes a TCP connection to the server and sends a HTTP request over it, which consists of a request line (such as GET /path/name/item.html HTTP/1.0) and then a set of optional header lines, all separated by CRLF (i.e. \r\n). The whole lot is ended with two consecutive CRLF sequences, at which point the server at the other end matches up the request with a resource and sends back an appropriate response. Resources are all identified by a path (e.g. /path/name/item.html) which could be a real file, or it could be a dynamic page.
That much of HTTP has stayed pretty much unchanged since it was first invented. However, think about how the client finds the server to connect to. What you give it is a URL, like this:
http://www.example.com/path/name/item.html
From this it looks at the scheme which is http, so it knows it's making a HTTP connection. The next part is the hostname. Under original HTTP the assumption was that each hostname resolved to its own IP address, and then the client connects to that IP address and makes the request. Since every server only had one website in those days, this worked fine.
As the number of websites increased, however, it became difficult to give every website a different IP address, particularly as many websites were so simple that they could easily be shared on the same physical machine. It was easy to point multiple domains at the same IP address (the DNS system makes this really simple), but when the server received the TCP request it would just know it had a request to its IP address - it wouldn't know which website to send back. So, a new Host header was added so that the client could indicate in the request itself which hostname it was requesting. This meant that one server could host lots of websites, and the webserver could use the Host header to tell which one to serve in the response.
These days this is very common - if you don't use the Host header than a number of websites won't know which server you're asking for. What usually happens is they assume some default website from the list they've got, and the chances are this won't have the file you're asking for. Even if you're asking for /, if you don't provide the Host header then the webserver may give you a 404 anyway, if it's configured that way - this isn't unreasonable if there isn't a sensible default website to give you.
You can find the description of the Host header in the HTTP RFC if you want more technical details.
Also, it's possible that websites just plain refuse HTTP/1.0 - I would be slightly surprised if that happened on so many websites, but you never know. Still, try the Host header first.
Contrary to what some people believe there's nothing to stop you using the Host header with HTTP/1.0, although you might still find some servers which don't like that. It's a little easier than supporting full HTTP/1.1, which requires that you understand chunked encoding and other complexities, although for simple example code you could probably get away with just adding the Host header and calling it HTTP/1.1 (I wouldn't suggest this is adequate for production code, however).
Anyway, you can try adding the Host header to make your request like this:
GET /path/name/item.html HTTP/1.0\r\n
Host: www.example.com\r\n
\r\n
I've split it across lines just for easy reading - you can see there's still the blank line at the end.
Even if this isn't causing the problem you're seeing, the Host header is a really good idea these days as there are definitely sites that won't work without it. If you're still having problems them give me an example of a site which doesn't work for you and we can try and work out why.
If anything I've said is unclear or needs more detail, just ask.
Related
I discovered HTTP as a nice way to handle my files on my server. I write C programs based on the sockets interface.
When I issue a HTTP GET, I can easily download files, but just files with known extensions. A (backup) file with the extension XXX is "not found" (actually the response return code is 200 ("OK"), but the response content is an HTML page containing the error message (404 = not found).
How can I make sure that the web server sends any file I ask for? I have experimented with the Accept keyword in the HTTP GET request, but that does not help (or I make a mistake).
I do not own the server, so I can not alter the server settings. At the client server, I do not use a browser, only the sockets interface (see above).
I think it is important to understand that HTTP does not really have a concept of "files" and "directories." Instead, the protocol operates on locations and resources. While they can represent files and directories, they are absolutely not guaranteed to be the same.
The server in question seems to be configured to serve 404 error pages when encountering unknown extensions. This is a bit weird and absolutely not up to the standard. Though it may happen if a Web-Application Firewall is deployed. Again, HTTP does not trust file extensions in any way but relies on metadata in form of MIME media types instead. That would also be what goes (more or less) into the Accept header of a request.
How can I make sure that the web server sends any file I ask for?
Well, you can't. While the client may express preferences, the server is the ultimate authority on what gets sent in which way.
I was surprised
A JMAP-supporting email host for the domain example.com SHOULD publish a SRV record _jmaps._tcp.example.com which gives a hostname and port (usually port 443).
The authentication URL is https://hostname/.well-known/jmap (following any redirects).
Other autodiscovery options using autoconfig.example.com or autodiscover.example.com may be added to a future version of JMAP to support clients which can’t use SRV lookup.
It doesn't match the original use cases for the well-known URI registry. Stuff like robots.txt, or dnt / dnt-policy.txt. And IPP / CUPS printing works fine without it, using a DNS TXT record to specify a URL. If you can look up SRV records, you can equally look up TXT. And the autodiscovery protocol involves XML which can obviously include a full URI.
E.g. what chance is there of this being accepted by the registry of well-known URIs? Or is it more likely to remain as something non-standard, like made-up URI schemes?
The idea almost certainly came from CalDav, which is already in the registry of well-known URIs. RFC 6532 defines DNS SRV and both DNS TXT and a well-known URI. So JMAP's proposal is perfectly well-founded.
It might sound strange that the URL is authenticated against, but this too is justified in CalDav. I think it helps shard users between multiple servers.
IMO it's not a good way to use SRV. On the other hand, JMAP is specifically considering clients that don't use SRV. One presumes the CalDav usage is for similar reasons.
It does seem bizarre that presumably web-centric implementations can't manage to discover full URIs (i.e. if they're using the autoconfig protocol).
I think you have to remember that these approaches start from user email addresses. The hallowed Web Architecture using HTTP URIs for everything... well, let's say it doesn't have much to say about mailto: URIs. DNS has got to be the "right" way to bridge the gap from domains to URIs. But in a web-centric world where you don't necessarily know how to resolve DNS, or only how to look up IPs to speak HTTP with? There are going to be some compromises.
So, I have read that it is possible to trace the IP of a Facebook friend while talking to him on chat by using Fiddler and Firebug. Now, as far as I am know, Facebook uses HTTPS and all in all, I cannot seem to get anything precise from Fiddler.
May anyone be kind enough to explain if this is really possible and if so, how the process goes?
I dont have a direct answer to your question, but i can give you some guidelines:
If the chat is working as a peer-to-peer network (which i highly doubt) you can trace the incoming tcp/udp connection and search it for the message using a sniffing program (like wireshark) and from there fetch their IP.
If the chat is based on that every message goes through a server (which is probably based on SOMM. not that it matters. (Server Oriented Messaging Model)). now if it is that way there is virtually no way to figure out the IP because the servers are acting as a proxy and masking the original IP, however. if FB includes some sort of meta data (which they do for location for phones etc.) that might contain the sender IP and some other stuff like the mac address etc etc. now i'm not sure of that but its a good place to look.
If you want help with firebug and all those other html/http/browser development tools there are plenty of tutorials out there. If you already know how to use it you might want to check the resources that's loaded when a message is sent. From experience i can tell that when a comment is added to something a whole bunch of crap happens that append stuff to the current html document. but i have never ever seen some meta data through that. anyway it's a good place to start.
I am still learning about SIP and all its protocols, specifically trying to integrate PJSIP into an iPhone application to make p2p calls.
I have a question about a peer 2 peer connection using PJSUA. I am able to
make calls perfectly to other clients on my local network by calling directly using the URI:
sip:192...*:5060
I am curious if this will work for
making direct calls to other SIP URIs that are not on the local
network without using server configuration - if not this way, is there another way of making p2p calls without server configuration?
thanks in advance,
You can make calls without server configuration, as a general principle, but something needs configuring. As mattjgalloway points out in the comments below your question, the most robust solution is a can of worms involving ICE which provides a kind of "umbrella" protocol for things like STUN.
Last time I touched this issue, I had the requirement that I couldn't use internet-based SIP servers to help. I came up with the idea of a registry of sorts: your client can define a bunch of "address spaces" with particular routing requirements. For SIP URIs in your LAN, you define no routing; for URIs in your company's VPN-accessed network, you define a route passing through your VPN connection; for everything else you define a route through your internet router.
By "define a route", I mean that when you place a call to a URI in some particular address space, you store what IP will go into a Contact header, what Route headers you might need, and so on.
Thus, the process of making a call becomes:
Look up in the set of address spaces for a match.
Ask that address space for the suitable bits needed to make a workable INVITE (appropriate Contact header details, Route headers, etc.)
Construct a normal INVITE, mutating as necessary for the previous step.
Send the INVITE as normal.
This essentially reproduces half of what ICE would give you, in a manually administrated form. "Half", because this ensures that one SIP agent can make calls such that the SIP routing all works. The missing half is you still need some kind of registrar somewhere, and each agent in your contact list needs to have the necessary setup to receive incoming calls. (If an agent's behind a NATting internet router, the router would need to either run a SIP proxy, or forward ports 5060, 5061 to a particular machine (which might be an agent, or a proxy serving the LAN's agents).
It is, indeed, a large can of worms.
The basic issue is to solve the problem of getting transport ports anywhere on the internet for multimedia traffic.
Many companies/experts have tried to solve this situation. A possible way out of is to buy a domain and setup a basic registrar using YATE or Asterisk on an address accessible from the internet and configure it to also use ICE as needed. Your iphone application at both ends could register automatically to it upon start. Then make P2P calls.
I have an otherwise working iPhone program. A recent change means that it generates some very long URLs (over 4000 characters sometimes) which I know isn't a great idea and I know how to fix -- that's not what I'm asking here.
The curious thing is that when I make the connection using a 3G network (Vodafone UK) I get this HTTP "414 Request-URI Too Long" error but when I connect using my local WiFi connection it works just fine.
Why would I get different results using different types of network? Could they be routing requests to different servers depending on where the connection originates? Or is there something else at stake here?
The corollary questions relate to how common this is. Is it likely to happen whenever I use a cell network or just some networks?
I would suspect that your 3G requests are being passed through some proxy which doesn't fancy 4000 char long URLs, and returns an HTTP 414 error.
I suspect the Vodafone connection is going through a proxy and/or gateway that can't handle the extra-long URL, and that your 414 Request-URI Too Long is coming from it.
Some wireless operators - including Vodafone UK, I believe - deploy inline proxies that transparently intercept your HTTP requests for purposes of optimization. Some of these proxies are based on software like the Squid proxy cache, which can have problems with very long URLs. As a result, your requests might not even be making it to your server.
To work around this issue, you can try sending your HTTP requests to the server on a non-standard TCP port. Generally speaking, these proxies are only configured to perform HTTP processing on port 80. Thus, if you can send your traffic on a different port, it might make it through unscathed.