HTTP GET file extension support - sockets

I discovered HTTP as a nice way to handle my files on my server. I write C programs based on the sockets interface.
When I issue a HTTP GET, I can easily download files, but just files with known extensions. A (backup) file with the extension XXX is "not found" (actually the response return code is 200 ("OK"), but the response content is an HTML page containing the error message (404 = not found).
How can I make sure that the web server sends any file I ask for? I have experimented with the Accept keyword in the HTTP GET request, but that does not help (or I make a mistake).
I do not own the server, so I can not alter the server settings. At the client server, I do not use a browser, only the sockets interface (see above).

I think it is important to understand that HTTP does not really have a concept of "files" and "directories." Instead, the protocol operates on locations and resources. While they can represent files and directories, they are absolutely not guaranteed to be the same.
The server in question seems to be configured to serve 404 error pages when encountering unknown extensions. This is a bit weird and absolutely not up to the standard. Though it may happen if a Web-Application Firewall is deployed. Again, HTTP does not trust file extensions in any way but relies on metadata in form of MIME media types instead. That would also be what goes (more or less) into the Accept header of a request.
How can I make sure that the web server sends any file I ask for?
Well, you can't. While the client may express preferences, the server is the ultimate authority on what gets sent in which way.

Related

When developing a new VSCODE extension, how to generate a Language Sever Protocol requirement and deal with the response?

When using the LSP provided by microsoft, at client how to generate a requiring json data and send it to the server? And then how to deal with the json data responded by the server?
I've read official documents but didn't find the way. All I want to do is to get the function definition text string instead of just showing it in a "hover".
VSCode sends the information to the implemented server for you. For example, the initialize request is sent to the server as soon as the plugin is started on the client-side.
Then, the server would have to build logic to handle the JSON payload sent by the client and return a specific response that conforms to the LSP specifications. I would suggest you turn on tracing in VSCode to see the messages being sent/received by the client and server. For lsp-sample, you can set this setting: "languageServerExample.trace.server": "verbose" in your package.json to enable tracing.
In terms of your question regarding function definition text string, I'm assuming you somehow want this in your client code(?) If this is the case, you would have to extract the function definition string in your hover handler, since that's how the server is sending that information over.

C Programming - Sending HTTP Request

My recent assignment is to make a proxy in C using socket programming. The proxy only needs to be built using HTTP/1.0. After several hours of work, I have made a proxy that can be used with Chromium. Various websites can be loaded such as google and several .edu websites; however, many websites give me a 404 error for page not found (these links work fine when not going through my proxy). These 404 errors even occur on the root address "/" of a site... which doesn't make sense.
Could this be a problem with my HTTP request? The HTTP request sent from the browser is parsed for the HTTP request method, hostname, and port. For example, if a GET request is parsed from the browser, a TCP connection is established to the hostname and port provided, and the HTTP GET request is sent in the following format:
GET /path/name/item.html HTTP/1.0\r\n\r\n
This format works for a small amount of websites, but a 404 error message is created for the rest. Could this be the problem? If not, what else could possibly be giving me this problem?
Any help would be greatly appreciated.
One likely explanation is the fact that you've designed a HTTP/1.0 proxy, whereas any website on a shared hosting site will only work with HTTP/1.1 these days (well, not quite, but I'll get to that in a second).
This isn't the only possible problem by a long way, but you'll have to give an example of a website which is failing like this to get some more ideas.
You seem to understand the basics of HTTP, that the client makes a TCP connection to the server and sends a HTTP request over it, which consists of a request line (such as GET /path/name/item.html HTTP/1.0) and then a set of optional header lines, all separated by CRLF (i.e. \r\n). The whole lot is ended with two consecutive CRLF sequences, at which point the server at the other end matches up the request with a resource and sends back an appropriate response. Resources are all identified by a path (e.g. /path/name/item.html) which could be a real file, or it could be a dynamic page.
That much of HTTP has stayed pretty much unchanged since it was first invented. However, think about how the client finds the server to connect to. What you give it is a URL, like this:
http://www.example.com/path/name/item.html
From this it looks at the scheme which is http, so it knows it's making a HTTP connection. The next part is the hostname. Under original HTTP the assumption was that each hostname resolved to its own IP address, and then the client connects to that IP address and makes the request. Since every server only had one website in those days, this worked fine.
As the number of websites increased, however, it became difficult to give every website a different IP address, particularly as many websites were so simple that they could easily be shared on the same physical machine. It was easy to point multiple domains at the same IP address (the DNS system makes this really simple), but when the server received the TCP request it would just know it had a request to its IP address - it wouldn't know which website to send back. So, a new Host header was added so that the client could indicate in the request itself which hostname it was requesting. This meant that one server could host lots of websites, and the webserver could use the Host header to tell which one to serve in the response.
These days this is very common - if you don't use the Host header than a number of websites won't know which server you're asking for. What usually happens is they assume some default website from the list they've got, and the chances are this won't have the file you're asking for. Even if you're asking for /, if you don't provide the Host header then the webserver may give you a 404 anyway, if it's configured that way - this isn't unreasonable if there isn't a sensible default website to give you.
You can find the description of the Host header in the HTTP RFC if you want more technical details.
Also, it's possible that websites just plain refuse HTTP/1.0 - I would be slightly surprised if that happened on so many websites, but you never know. Still, try the Host header first.
Contrary to what some people believe there's nothing to stop you using the Host header with HTTP/1.0, although you might still find some servers which don't like that. It's a little easier than supporting full HTTP/1.1, which requires that you understand chunked encoding and other complexities, although for simple example code you could probably get away with just adding the Host header and calling it HTTP/1.1 (I wouldn't suggest this is adequate for production code, however).
Anyway, you can try adding the Host header to make your request like this:
GET /path/name/item.html HTTP/1.0\r\n
Host: www.example.com\r\n
\r\n
I've split it across lines just for easy reading - you can see there's still the blank line at the end.
Even if this isn't causing the problem you're seeing, the Host header is a really good idea these days as there are definitely sites that won't work without it. If you're still having problems them give me an example of a site which doesn't work for you and we can try and work out why.
If anything I've said is unclear or needs more detail, just ask.

See what website the user is visiting in a browser independent way

I am trying to build an application that can inform a user about website specific information whenever they are visiting a website that is present in my database. This must be done in a browser independent way so the user will always see the information when visiting a website (no matter what browser or other tool he or she is using to visit the website).
My first (partially successful) approach was by looking at the data packets using the System.Net.Sockets.Socket class etc. Unfortunately I discoverd that this approach only works when the user has administrator rights. And of course, that is not what I want. My goal is that the user can install one relatively simple program that can be used right away.
After this I went looking for alternatives and found a lot about WinPcap and some of it's .NET wrappers (did I tell you I am programming c# .NET already?). But with WinPcap I found out that this must be installed on the user's pc and there is nog way to just reference some dll files and code away. I already looked at including WinPcap as a prerequisite in my installer but that is also to cumbersome.
Well, long story short. I want to know in my application what website my user is visiting at the moment it is happening. I think it must be done by looking at the data packets of the network but can't find a good solution for this. My application is build in C# .NET (4.0).
You could use Fiddler to monitor Internet traffic.
It is
a Web Debugging Proxy which logs all HTTP(S) traffic between your computer and the Internet. Fiddler allows you to inspect traffic, set breakpoints, and "fiddle" with incoming or outgoing data. Fiddler includes a powerful event-based scripting subsystem, and can be extended using any .NET language.
It's scriptable and can be readily used from .NET.
One simple idea: Instead of monitoring the traffic directly, what about installing a browser extension that sends you the current url of the page. Then you can check if that url is in your database and optionally show the user a message using the browser extension.
This is how extensions like Invisible Hand work... It scans the current page and sends relevant data back to the server for processing. If it finds anything, it uses the browser extension framework to communicate those results back to the user. (Using an alert, or a bar across the top of the window, etc.)
for a good start, wireshark will do what you want.
you can specify a filter to isolate and view http streams.
best part is wireshark is open source, and built opon another program api, winpcap which is open source.
I'm guessing this is what you want.
capture network data off the wire
view the tcp traffic of a computer, isolate and save(in part or in hole) http data.
store information about the http connections
number 1 there is easy, you can google for a winpcap tutorial, or just use some of their sample programs to capture the data.
I recomend you study up on the pcap file format, everything with winpcap uses this basic format and its structers.
now you have to learn how to take a tcp stream and turn it into a solid data stream without curoption, or disorginized parts. (sorry for the spelling)
again, a very good example can be found in the wireshark source code.
then with your data stream, you can simple read the http format, and html data, or what ever your dealing with.
Hope that helps
If the user is cooperating, you could have them set their browser(s) to use a proxy service you provide. This would intercept all web traffic, do whatever you want with it (look up in your database, notify the user, etc), and then pass it on to the original location. Run the proxy on the local system, or on a remote system if that fits your case better.
If the user is not cooperating, or you don't want to make them change their browser settings, you could use one of the packet sniffing solutions, such as fiddler.
A simple stright forward way is to change the comupter DNS to point to your application.
this will cause all DNS traffic to pass though your app which can be sniffed and then redirected to the real DNS server.
it will also save you the hussel of filtering out emule/torrent traffic as it normally work with pure IP address (which also might be a problem as it can be circumvented by using IP address to browse).
-How to change windows DNS Servers
-DNS resolver
Another simple way is to configure (programmaticly) the browsers proxy to pass through your server this will make your life easier but will be more obvious to users.
How to create a simple proxy in C#?

iphone/mac - how to download files with AsyncSocket

I have a remote server with some files. I want to use AsyncSocket to download a file, chunk by chunk. I would like to send HTTP requests with ranges through the socket and get the appropriate chunks of data. I understand how to do this on localhost, but not from a remote server. I really don't know how to use the connectToHost and acceptOnInterface (previously acceptOnAddress) methods.
Please help
Thanks
AsyncSocket is a general purpose data connection. If you want it to talk HTTP, you'll need to code the HTTP portion yourself. You probably don't actually want this; NSURLConnection should do what you want, provided the server supports it.
What you're asking for is the Range: header in HTTP. See 14.35.2 in RFC2616. You just need to add this header to your NSURLRequest. Again, this presumes that the server you're talking to supports this (you need to check the Accept-Ranges: header in the response).
There's a short article with example code about this at Surgeworks.
You should also look at ASIHTTPRequest, which includes resumable downloads and download progress delgates, and can likely be adapted to doing partial downloads. It may already have the solution to the specific issue you're trying to solve.

RESTful PUT and DELETE and firewalls

In the classic "RESTful Web Services" book (O'Reilly, ISBN 978-0-596-52926-0) it says on page 251 "Some firewalls block HTTP PUT and DELETE but not POST."
Is this still true?
If it's true I have to allow overloaded POST to substitute for DELETE.
Firewalls blocking HTTP PUT/DELETE are typically blocking incoming connections (to servers behind the firewall). Assuming you have controls over the firewall protecting your application, you shouldn't need to worry about it.
Also, firewalls can only block PUT/DELETE if they are performing deep inspection on the network traffic. Encryption will prevent firewalls from analyzing the URL, so if you're using HTTPS (you are protecting your data with SSL, right?) clients accessing your web service will be able to use any of the standard four HTTP verbs.
Some 7 layer firewalls could analyze traffic to this degree. But I'm not sure how many places would configure them as such. You might check on serverfault.com to see how popular such a config might be (you could also always check with your IT staff)
I would not worry about overloading a POST to support a DELETE request.
HTML 4.0 and XHTML 1.0 only support GET and POST requests (via ) so it is commonplace to tunnel a PUT/DELETE via a hidden form field which is read by the server and dispathced appropriately. This technique preserves compatibility across browsers and allows you to ignore any firewall issues.
Ruby On Rails and .NET both handle RESTful requests in this fashion.
As an aside GET, POST, PUT & DELETE requests are fully supported through the XMLHttpRequest request object at present. XHTML 2.0 's officially supports GET, POST, PUT & DELETE as well.
You can configure a firewall to whatever you want ( at least in theory ) so don't be surprised if some sys admins do block HTTP PUT/DELETE.
The danger of HTTP PUT/DELETE is concerning some mis-configured servers: PUT replaces documents (and DELETE deletes them ;-) on the target server. So some sys admins decide up right to block PUT in case a crack is opened somewhere.
Of course we are talking about Firewalls acting at "layer 7" and not just at the IP layer ;-)