To encode or not to encode path parts in GCS? - google-cloud-storage

Should path parts be encoded or not encoded when it comes to Google Cloud Storage?
Encoding URI path parts says they should be encoded, but Object names talks about the possibility of naming GCS objects in a seemingly-hierarchical manner...
So if I name an object abc/xyz, is the path to my object https://www.googleapis.com/storage/v1/b/example-bucket/o/abc%2fxyz or https://www.googleapis.com/storage/v1/b/example-bucket/o/abc/xyz?
Which is it!? Somebody please help me with this confusion.

TL;DR
You can use nested folders when working through a GCS client library but sending GET requests to the URL itself will need you to understand how to map the folder names appropriately.
Let's all pretend that folders are real
Yes, you need to encode the object names. There's a useful description here which I partially quote below (with my emphasis) for reference:
Object names reside in a flat namespace within a bucket, [...] means
that objects do not reside within subdirectories in a bucket. For
example, you can name an object
/europe/france/paris.jpg
to make it appear that paris.jpg resides in the subdirectory /europe/france, but to Cloud Storage, the object simply exists in the bucket and has the
name /europe/france/paris.jpg.
So there are no subdirectories but appropriate naming and the use of a knowledgeable UI or API will make it appear as if there is some hierarchy.
All GCS client libraries will know to encode the names correctly but if you are running raw GETs on them (with appropriate authentication), you will have to do this yourself. The relevant section is here and I quote the most relevant part here:
For example, if you send a GET request for the object named foo/?bar
in the bucket example-bucket, then your request URI should be:
GET https://www.googleapis.com/storage/v1/b/example-bucket/o/foo%2f%3fbar
So you can see that the object name part as been encoded with %2f for the slash (/) character. There's a more complete description of the naming convention here.
Metadata v Content using GCS JSON API
I was slightly surprised that the default behaviour for the API was to return metadata about the object in the bucket. To get the actual content I had to append '?alt=media' as described at the end of this section:
By default, this responds with an object resource in the response
body. If you provide the URL parameter alt=media, then it will respond
with the object data in the response body.

Related

Can I fake uploaded image filesize?

I'm building a simple image file upload form. Programmatically, I'm using the Laravel 5 framework. Through the Input facade (through Illuminate), I can resolve the file object, which in itself is an UploadedFile (through Symfony).
The UploadedFile's API ref page (Symfony docs) says that
public integer | null getClientSize()
Returns the file size. It is extracted from the request from which the
file has been uploaded. It should not be considered as a safe
value. Return Value integer|null The file size
What will be these cases where the uploaded filesize is wrongly reported?
Are there known exploits using this?
How can the admin ensure this is detected (and hence logged as a trespass attempt)?
That method is using the "Content-Length" header, which can easily be forged. You'll want to use the easy construct $_FILES['myfile']['size']. As an answer to another question has already stated: Can $_FILES[...]['size'] be forged?
This value checks the actual size of the file, and is not modified by the provided headers.
If you'd like to check for people misbehaving, you can simply compare the content-length header to your $_FILES['myfile']['size'] value.

How do I know what to name a file downloaded using HTTP?

I am creating an HTTP client downloader in Python. I am able to correctly download a file such as http://www.google.com/images/srpr/logo11w.png just fine. However, I'm not sure what to actually name the thing.
There is of course the filename at the end of the URL, but is this always reliable?
If I recall correctly, wget uses the following heuristic:
If a Content-Disposition header exists, get the filename from there.
If the filename component of the URL exists (e.g. http://myserver/filename), use that.
If there is no filename component (e.g. http://www.google.com), derive the filename from the Content-Type header (such as index.html for text/html)
In all cases, if this filename is already present in the directory use a numerical suffix, such as index (1).html, or overwrite, depending on configuration.
There are plenty of other flags that control other heuristics, such as creating .html for ASP/DHTML content-types.
In short, it really depends how far you want to go. For most people, doing the first two + basic Content-Type->name mapping should be enough.

Is it bad practice to allow specifying parameters in URL for POST

Should parameters for POST requests (elements of the resource being created) be allowed to be added to the URL as well as in the body?
For example, let say I have a POST to create a new user at
/user
With the full set of parameters name, email, etc... in the body of the request.
However, I've seen many API's would accept the values in either the body or URL parameters like this:
/user?name=foo&email=foo#bar.com
Is there any reason this second option, allowing the parameters in the URL is bad practice? Does it violate any component of REST?
The intent of a query parameter is to help identify the target resource for a request. The body of a POST should be used to specify instructions to the server.
The query component contains non-hierarchical data that, along with
data in the path component (Section 3.3), serves to identify a
resource within the scope of the URI's scheme and naming authority
(if any).
    -- RFC 3986 Section 3.4
The hierarchical path component and optional query component serve
as an identifier for a potential target resource within that origin
server's name space.
    -- RFC 7230 Section 2.7.1
The Udacity Web Development course, be Steve Huffman (the man behind Reddit), recommends only using POST requests to update server side data. Steve highlights why using GET parameters to do so can be problematic.

How to design a REST API to allow returning files with metadata

Suppose I'm designing a REST API and I need the clients to be able to obtain files with metadata. What is a good way to design the resources / operations?
Some ideas come to mind:
A single resource (i.e. GET /files/{fileId}), which returns a multi-part response containing both the file and a JSON/XML structure with metadata. I have a feeling that this is not a very good approach. For example, you cannot use the Accept header for the clients to determine if they want a XML or a JSON metadata representation, since the response type would be multi-part in both cases.
Two resources (i.e. GET /files/{fileId} and GET /files/{fileId}/metadata), where the first one returns the file itself and the second one a JSON/XML structure with metadata. There can be a link from the metadata to the file. However, how do I send a link to the metadata along with the file?
I would suggest using the second idea you presented. This is the strategy used by most of the major web drives (Box, Dropbox, Google Drive, etc). They often have a significantly different URL because they store content and metadata in disparate locations.
You can add a Link header to the file response with a link to the metadata. Link headers are described in RFC 5988. The set of currently-registered link relations is here. Off the cuff, it seems that the describedBy relation is appropriate here.
I've had success with the following kind of API design. This differs slightly from what you suggested in that the main resource just contains links to its components.
POST /file
Request
<bytes of file>
Response
Location: /file/17
{
"id": 17
}
GET /file/17
{
"data": "/file/data/17",
"metadata": "/file/metadata/17"
}
GET /file/data/17
<bytes of file>
GET /file/metadata/17
{
"type": "image",
"format": "png"
}
DELETE /file/17
Your first Option is not at all a good choice because it violates following REST constraint.
Manipulation of resources through these representations under Uniform interface Principle.
When a client holds a representation of a resource, including any
metadata attached, it has enough information to modify or delete the resource.
If you brake it. Your URL will not be consider as RESTful.
Wiki about it.

How to use URI as a REST resource?

I am building a RESTful API for retrieving and storing comments in threads.
A comment thread is identified by an arbitrary URI -- usually this is the URL of the web page where the comment thread is related to. This design is very similar to what Disqus uses with their system.
This way, on every web page, querying the related comment thread doesn't require storing any additional data at the client -- all that is needed is the canonical URL to the page in question.
My current implementation attempts to make an URI work as a resource by encoding the URI as a string as follows:
/comments/https%3A%2F%2Fexample.org%2Ffoo%2F2345%3Ffoo%3Dbar%26baz%3Dxyz
However, before dispatching it to my application, the request URI always gets decoded by my server to
/comments/https://example.org/foo/2345?foo=bar&baz=xyz
This isn't working because the decoded resource name now has path delimiters and a query string in it causing routing in my API to get confused (my routing configuration assumes the request path contains /comments/ followed by a string).
I could double-encode them or using some other encoding scheme than URI encode, but then that would add complexity to the clients, which I'm trying to avoid.
I have two specific questions:
Is my URI design something I should continue working with or is there a better (best?) practice for doing what I'm trying to do?
I'm serving API requests with a Go process implemented using Martini 'microframework'. Is there something Go or Martini specific that I should do to make the URI-encoded resource names to stay encoded?
Perhaps a way to hint to the routing subsystem that the resource name is not just a string but a URL-encoded string?
I don't know about your url scheme for your application, but single % encoded values are valid in a url in place of the chars they represent, and should be decoded by the server, what you are seeing is what I would expect. If you need to pass url reserved characters as a value and not have them decoded as part of the url, you will need to double % encode them. It's a fairly common practice, the complexity added to the client & server will not be that much, and a short comment will do rightly.
In short, If you need to pass url chars, double % encode them, it's fine.