Google Cloud Storage: Setting incorrect MIME-type - google-cloud-storage

I have a Node.js server running on a Google Compute Engine virtual instance. The server streams incoming files to Google Cloud Storage GCS. My code is here: Node.js stream upload directly to Google Cloud Storage
I'm passing Content-Type in the XML headers and it's working just fine for image/jpeg MIME-types, but for video/mp4 GCS is writing files as application/octet-stream.
There's not much to this, so I'm totally at a loss for what could be wrong ... any ideas are welcome!
Update/Solution
The problem was due to the fact that the multiparty module was creating content-type: octet-stream headers on the 'part' object that I was passing into the pipe to GCS. This caused GCS to receive two content-types, of which the octet part was last. As a result, GCS was using this for the inbound file.

Ok, looking at your HTTP request and response it seems like content-type is specified in the URL returned as part of the initial HTTP request. The initial HTTP request should return the endpoint which can be used to upload the file. I'm not sure why that is specified there but looking at the documentation (https://developers.google.com/storage/docs/json_api/v1/how-tos/upload - start a resumable session) it says that X-Upload-Content-Type needs to be specified, along some other headers. This doesn't seem to be specified in HTTP requests that were mentioned above. There might be an issue with the library used but the returned endpoint does not look as what is specified in the documentation.
Have a look at https://developers.google.com/storage/docs/json_api/v1/how-tos/upload, "Example: Resumable session initiation request" and see if you still have the same issue if you specify the same headers as suggested there.

Google Cloud Storage is content-type agnostic, i.e., it treats any kind of content in the same way (videos, music, zip files, documents, you name it).
But just to give some idea,
First I believe that the video () you are uploading is more or less size after it being uploded. so , it falls in application/<sub type>. (similar to section 3.3 of RFC 4337)
To make this correct, I believe you need to fight with storing mp4 metadata before and after the file being uploaded.
please let us know of your solution.

A solution that worked for me in a similar situation is below. TLDR: Save video from web app to GCS with content type video/mp4 instead of application/stream.
Here is the situation. You want to record video in the browser and save it to Google Cloud Storage with a content type set to video/mp4 instead of application/octet-stream. User records video and clicks button to send video file to your server for saving. After sending the video file from the client to your server, the server sends the video file to Google Cloud Storage for saving.
You successfully save the video to Google Cloud Storage and by default GCS assigns a content type of application/octet-stream to the video.
To assign a content type video/mp4 instead of application/octet-stream, here is some server-side Python code that works.
storage_client = storage.Client()
bucket = storage_client.bucket(bucket_name)
blob = bucket.blob(destination_blob_name)
blob.upload_from_file(file_obj, rewind=True)
blob.content_type = 'video/mp4'
blob.patch()
Here are some links that might help.
https://cloud.google.com/storage/docs/uploading-objects#storage-upload-object-python
https://stackoverflow.com/a/33320634/19829260
https://stackoverflow.com/a/64274097/19829260
NOTE: at the time of this writing, the Google Docs about editing metadata don't work for me because they say to set metadata but metadata seems to be read-only (see SO post https://stackoverflow.com/a/33320634/19829260)
https://cloud.google.com/storage/docs/viewing-editing-metadata#edit

Related

Data Factory can't download CSV file from web API with Basic Auth

I'm trying to download a CSV file from a website in Data Factory using the HTTP connector as my source linked service in a copy activity. It's basically a web call to a url that looks like https://www.mywebsite.org/api/entityname.csv?fields=:all&paging=false.
The website uses basic authentication. I have manually tested by using the url in a browser and entering the credentials, and everything works fine. I have used the REST connector in a copy activity to download the data as a JSON file (same url, just without the ".csv" in there), and that works fine. But there is something about the authentication in the HTTP connector that is different and causing issues. When I try to execute my copy activity, it downloads a csv file that contains the HTML for the login page on the source website.
While searching, I did come across this Github issue on the docs that suggests that the basic auth header is not initially sent and that may be causing an issue.
As I have it now, the authentication is defined in the linked service. I'm hoping that maybe I can add something to the Additional Headers or Request Body properties of the source in my copy activity to make this work, but I haven't found the right thing yet.
Suggestions of things to try or code samples of a working copy activity using the HTTP connector and basic auth would be much appreciated.
The HTTP connector expects the API to return a 401 Unauthorized response after the initial request. It then responds with the basic auth credentials. If the API doesn't do this, it won't use the credentials provided in the HTTP linked service.
If that is the case, go to the copy activity source, and in the additional headers property add Authorization: Basic followed by the base64 encoded string of username:password. It should look something like this (where the string at the end is the encoded username:password):
Authorization: Basic ZxN0b2njFasdfkVEH1fU2GM=`
It's best if that isn't hard coded into the copy activity but is retrieved from Key Vault and passed as secure input to the copy activity.
I suggest you try to use the REST connector instead of the HTTP one. It supports Basic as authentication type and I have verified it using a test endpoint on HTTPbin.org
Above is the configuration for the REST linked service. Once you have created a dataset connected to this linked service you can include it in you copy activity.
Once the pipeline executes the content of the REST response will be saved in the specified file.

Google Cloud Bucket custom metadata set but not returned in the HTTP request

I've managed to add custom metadata to my public file stored in Google Cloud Bucket, but that custom header is not returned in the HTTP response.
The image below shows that my custom metadata (X-Content-Type-Options) was added to my object. When I request that file from my browser, this custom header is not part of the response.
It is possible to add custom headers, but they will be prefixed with x-goog-meta-. AWS S3 suffers from the same limitation. It seems that this is due to security reasons. The leanest solution I've found to overcome this limitation is to use an edge such as AWS Lambda Edge or Cloudflare Edge Workers. The idea is to rewrite the headers on the fly. In my case, that would mean catching all headers that start with x-goog-meta-, and removing that prefix.
Here is an article of somebody who did that with AWS Lambda Edge: https://medium.com/#tom.cook/edge-lambda-cloudfront-custom-headers-3d134a2c18a2
You can use the x-goog-meta- for setting the metadata to the object (some examples here for adding a single metadata or for adding it in a cp operation).
You can get the custom metadata with the gsutil command and the -L param. You can also recover the custom metadata with the HTTP request API (try it out here).
But the custom metadata aren't provided in your browser when you access to the object via the URL https://storage.cloud.google.com/.... You have to build a proxy which requests the object with Storage API (for getting the content and the custom metadata) and which provides the object with the expected metadata.

How to get response-content-type working for Google Cloud Storage signedUrl

I have a .pdf object stored in Google Cloud Storage with Content-Type = application/octet-stream.
When giving temporary access through a signed URL, I extend the URL with:
&response-content-type=application%2Fpdf
Nevertheless, the response coming back from Google Cloud Storage still contains Content-Type = application/octet-stream
Inspecting the request + response through the browser confirms this behaviour.
According to the documentation (https://cloud.google.com/storage/docs/xml-api/reference-headers#responsecontenttype) the response-content-type should ensure Content-Type = application/pdf in my example.
For another usecase, I am succesfully making use of the Content-Disposition override via response-content-disposition, so I am very curious why the response-content-type is not working for me.
Anyone any idea what I am missing to make this work?
Thanks!
As per documentation signed URLs are not authenticated GET requests that support content-type override.
Query String Parameters like response-content-disposition and response-content-type are not verified by the signature. To force a Content-Disposition or Content-Type in the response, set those parameters in the object metadata using gsutil or the XML/JSON API.

upload small thumbnail images from iPhone app to Amazon S3

I noticed couple thread on this already and they even provided sample code.
http://brunofuster.wordpress.com/2010/11/03/uploading-an-image-from-iphone-with-uiimagepicker-and-asihttprequests3/
But what baffled me is that - there was no response to get handled? is it because that s3 doesn't return any response? I am expecting to receive at least an URL to the image on S3, how could I get that?
If you look at the S3 REST object PUT documentation you will see the response that is returned from S3.
When you post to S3 you know the bucket name you are putting the image into plus you know the filename. These two pieces of information should be all you need to get a url to the image.
The documentation states that in addition to the PUT response header(s) you can see some of the common headers too.
This implementation of the operation
can include the following response
headers in addition to the response
headers common to all responses. For
more information, see Common Response
Headers.
If you look at the ASIHTTPRequest Amazon Simple Storage Service (S3) support you will see how to get a response from the ASIS3ObjectRequest object.
Tom,
If you wish to just get your S3 image url, you don't need the response information considering you already know the image name and the bucket (if there was no error).
Anyway, you can get the response from a sync request by using [request responseString|responseData].
But the right thing to do is an async call using operation queues and delegates to get the response success or error. My blog post just provided a minimal sample. I will look into that and improve the post itself.
Thanks!
In addition to the answers already provided, you might also want to look at Amazon's recently released AWS SDK for iOS, which includes sample code for uploading images, etc. to S3.

How Do I Upload Multiple Files Using the iPhone

I am posting (HTTP POST) various values to the posterous api. I am successfully able to upload the title, body, and ONE media file, but when I try to add in a second media file I get a server 500.
They do allow media and media[] as parameters.
How do I upload multiple files with the iPhone SDK?
The 500 your getting is probably based on one of two things:
An incorrect request
An error on the server
Now, if its an incorrect, the HTTP server would be more helpful responding back with like a 415 (unsupported media type) or something. A 500 insists that something went wrong on the server and that your request was valid.
You'll have to dig into the server API or code (if you wrote it), or read the docs and figure out what's wrong with your second request ... seems like maybe your not setting the appropriate media type?
EDIT: Ok, so I looked at the API. It appears your posting XML, so your request content-type should be
Content-Type: application/xml
The API doc didn't specifically say, but that would be the correct type.
EDIT: Actually on second glance, are you just POSTing w/URI params? Their API doc isn't clear (I'm also looking rather quickly)