How do I know what to name a file downloaded using HTTP? - sockets

I am creating an HTTP client downloader in Python. I am able to correctly download a file such as http://www.google.com/images/srpr/logo11w.png just fine. However, I'm not sure what to actually name the thing.
There is of course the filename at the end of the URL, but is this always reliable?

If I recall correctly, wget uses the following heuristic:
If a Content-Disposition header exists, get the filename from there.
If the filename component of the URL exists (e.g. http://myserver/filename), use that.
If there is no filename component (e.g. http://www.google.com), derive the filename from the Content-Type header (such as index.html for text/html)
In all cases, if this filename is already present in the directory use a numerical suffix, such as index (1).html, or overwrite, depending on configuration.
There are plenty of other flags that control other heuristics, such as creating .html for ASP/DHTML content-types.
In short, it really depends how far you want to go. For most people, doing the first two + basic Content-Type->name mapping should be enough.

Related

To encode or not to encode path parts in GCS?

Should path parts be encoded or not encoded when it comes to Google Cloud Storage?
Encoding URI path parts says they should be encoded, but Object names talks about the possibility of naming GCS objects in a seemingly-hierarchical manner...
So if I name an object abc/xyz, is the path to my object https://www.googleapis.com/storage/v1/b/example-bucket/o/abc%2fxyz or https://www.googleapis.com/storage/v1/b/example-bucket/o/abc/xyz?
Which is it!? Somebody please help me with this confusion.
TL;DR
You can use nested folders when working through a GCS client library but sending GET requests to the URL itself will need you to understand how to map the folder names appropriately.
Let's all pretend that folders are real
Yes, you need to encode the object names. There's a useful description here which I partially quote below (with my emphasis) for reference:
Object names reside in a flat namespace within a bucket, [...] means
that objects do not reside within subdirectories in a bucket. For
example, you can name an object
/europe/france/paris.jpg
to make it appear that paris.jpg resides in the subdirectory /europe/france, but to Cloud Storage, the object simply exists in the bucket and has the
name /europe/france/paris.jpg.
So there are no subdirectories but appropriate naming and the use of a knowledgeable UI or API will make it appear as if there is some hierarchy.
All GCS client libraries will know to encode the names correctly but if you are running raw GETs on them (with appropriate authentication), you will have to do this yourself. The relevant section is here and I quote the most relevant part here:
For example, if you send a GET request for the object named foo/?bar
in the bucket example-bucket, then your request URI should be:
GET https://www.googleapis.com/storage/v1/b/example-bucket/o/foo%2f%3fbar
So you can see that the object name part as been encoded with %2f for the slash (/) character. There's a more complete description of the naming convention here.
Metadata v Content using GCS JSON API
I was slightly surprised that the default behaviour for the API was to return metadata about the object in the bucket. To get the actual content I had to append '?alt=media' as described at the end of this section:
By default, this responds with an object resource in the response
body. If you provide the URL parameter alt=media, then it will respond
with the object data in the response body.

How to pack a variable into an HTTP GET request in socket.send() - Python 2.7

First off thanks for reading!
Second off YES I have tried to find the answer! :) Perhaps I haven't found it because I'm not using the right words to describe my problem, but it's been about 4 hours that I've been trying to figure it out now and I'm getting a little loopy trying to piece it together on my own.
I am very new to programming. Python is my first language. I am on my third Python course. I have an assignment to use the socket library (not urllib library - I know how to do that) to make a socket and use GET to receive information. The problem is that the program needs to take raw input for the URL in question.
I have everything else the way I want it, but I need to know the syntax that I'm supposed to be using INSIDE my "GET" request in order for the HTTP message to include the requested document path.
I have tried (obviously not all together lol):
mysock.send('GET (url) HTTP/1.0\n\n')
mysock.send( ('GET (url) HTTP:/1.0\n\n'))
mysock.send(('GET (url) HTTP:/1.0\n\n'))
mysock.send("GET (url) HTTP/1.0\n\n")
mysock.send( ("'GET' (url) HTTP:/1.0\n\n"))
mysock.send(("'GET' (url) 'HTTP:/1.0\n\n'"))
and:
basically every other configuration of the above (, ((, ( (, ', '' combinations listed above.
I have also tried:
-Creating a string using the 'url' variable first, and then including it inside mysock.send(string)
-Again with the "string-first" theory, but this time I used %r to refer to my user input (so 'GET %r HTTP/1.0\n\n' % url basically)
I've read questions here, other programming websites, the whole chapter in the book and the whole lectures/notes online, I've read articles on the socket library and the .send(), and of course articles on GET requests... but I'm clearly missing something. It seems most don't use socket library when they can use urllib and I don't blame them!!
Thank you again...
Someone from the university posted back to me that the url variable can concatenated with the GET syntax and assigned to a string variable which can then be called with .send(concatenatedvariable) - I had mentioned trying that but had missed that GET requires a space after the word 'GET' so of course concatenating didn't include a space and that blew it. In case anyone else wants to know :)
FYI: A fully quallified URL is only allowed in HTTP/1.1 requests. It is not the norm, though, as HTTP/1.1 requires setting the Host header. The relevant piece of reading would've been RFC 7230, sec. 3.1.1 and possibly RFC 3986. The syntax of the parameters is largely borrowed from the CGI format. It is in no way enforced, however. In a nutshell, everything put together would look like this on the wire:
GET /path?param1=value1&param2=value2 HTTP/1.1
Host: example.com
As a final note: The line delimiter in HTTP is CRLF (\r\n). For robustness, a simple linefeed is acceptable as well but not recommended.

How to use Postgres and static file manupulation in Nginx

I would like to use Nginx as my CDN for a file hosting system. I saw a great module for nginx that allows postgres connection (https://github.com/FRiCKLE/ngx_postgres) it works really well, however when I try to use it while having alias directive it seems to ignore the alias or file download and rather give me an empty file.
My idea is, to use the UUID from the URL and find the correct file doing a query and then using the found details to change the filename header so that the user's client will download automatically set the name to the original filename instead of a uuid.
Here is the code.
location /dl{
postgres_output none;
postgres_pass database;
postgres_query "SELECT * FROM \"Files\" WHERE uuid = '$args'";
postgres_set $filename 0 name;
alias /home/ubuntu/fileStorage;
add_header Content-Disposition "attachment; filename=$filename";
}
I think somehow the postgres directive is locking up this block. Is there a way I can run the postgres query without effecting the download block?
It seem that you expect that the line
add_header Content-Disposition "attachment; filename=$filename";
will cause the browser to download the file given by $filename. This is not how the Content-Disposition header works, it simply tells the browser to interpret the response body as a file. You're going to have to do something additional to get the proper content to the client. Perhaps what you really want is to issue a redirect?

Can I fake uploaded image filesize?

I'm building a simple image file upload form. Programmatically, I'm using the Laravel 5 framework. Through the Input facade (through Illuminate), I can resolve the file object, which in itself is an UploadedFile (through Symfony).
The UploadedFile's API ref page (Symfony docs) says that
public integer | null getClientSize()
Returns the file size. It is extracted from the request from which the
file has been uploaded. It should not be considered as a safe
value. Return Value integer|null The file size
What will be these cases where the uploaded filesize is wrongly reported?
Are there known exploits using this?
How can the admin ensure this is detected (and hence logged as a trespass attempt)?
That method is using the "Content-Length" header, which can easily be forged. You'll want to use the easy construct $_FILES['myfile']['size']. As an answer to another question has already stated: Can $_FILES[...]['size'] be forged?
This value checks the actual size of the file, and is not modified by the provided headers.
If you'd like to check for people misbehaving, you can simply compare the content-length header to your $_FILES['myfile']['size'] value.

How to create and implement a pixel tracking code

OK, here's a goal I've been looking for a while.
As it's known, most advertising and analytics companies use a so called "pixel" code in order to track websites views, transactions, conversion etc.
I do have a general idea on how it works, the problem is how to implement it. The tracking codes consist from few parts.
The tracking code itself.
This is the code that the users inserts on his webpage in the <head> section. The main goal of this code is to set some customer specific variables and to call the *.js file.
*.js file.
This file holds all the magic of CRUD (create/read/update/delete) cookies, track user's events and interaction with the webpage.
The pixel code.
This is an <img> tag with the src atribute pointing to an image *.gif (for example) file that takes all the parameters collected on the page, and stores them in the database.
Example:
WordPress pixel code: <img id="wpstats" src="http://stats.wordpress.com/g.gif?host=www.hostname.com&list_of_cookies_value_pairs;" alt="">
Google Analitycs:
http://www.google-analytics.com/__utm.gif?utmwv=4&utmn=769876874&etc
Now, it's obvious that the *.gif request has to reach a server side scripting language in order to read the parameters data and store them in a db.
Does anyone have an idea how to implement this in Zend?
UPDATE
Another thing I'm interested in is: How to avoid the user's browser to load the cached *.gif ? Will a random parameter value do the trick? Example: src="pixel.gif?nocache=random_number" where the nocache parameter value will be different on every request.
As Zend is built using PHP, it might be worth reading the following question and answer: Developing a tracking pixel.
In addition to this answer and as you're looking for a way of avoiding caching the tracking image, the easiest way of doing this is to append a unique/random string to it, which is generated at runtime.
For example, server-side and with the creation of each image, you might add a random URL id:
<?php
// Generate random id of min/max length
$rand_id = rand(8, 8);
// Echo the image and append a random string
echo "<img src='pixel.php?a=".$vara."&b=".$varb."&rand=".$rand_id."'>";
?>
Just adding my 2 cents to this thread because I think an important, and frequently used, option is missing: you don't necessarily need a scripting language to capture the request. A more efficient approach is to use the web server access log (like apache access log for instance) to log the request and then handle that log with whatever tools you see fit, like ELK stack for instance.
This makes serving the requests much lighter because no scripting language is loaded to prepare the response, just native apache response, which is typically much more efficient.
First of all, the *.gif doesn't need to be that file type, the only thing that is of interest is the Content-Type http header. Set that to image/gif (or any other, appropiate type) in the beginning, execute your code and render some sort of image to the response body.
Well, all of the above codes are correct and is good but to be certain, the guy above mention "g.gif"
You can just add a simple php code to write to an sql or fwrite("file.txt",$opened)
where var $opened serves as the counter++ if someone opened your mail... then save it as "g.gif"
TO DO all of this just add these:
<Files "/thisdirectory">
AddType application/x-httpd-php .gif
</Files>
to your ".htaccess" file but be sure to make a new directory for that g.gif or whatever.gif where the directory only contains g.gif and .htaccess