Best approach to upload the file via REST api from API gateway - rest

User Case: Customer can upload the file from the public REST api to our S3 bucket and then we can process the file using downstream services.
After doing some research I am able to find 3 ways to do it:
Uploading using OCTET-STREAM file type
Upload the file using form-data request
Upload the file using the pre-signed URL
In first 2 cases user will send the binary file and we will upload the file to S3 after file validation.
In the 3rd method user have to hit 3 apis. First API to get the S3 pre-signed URL which will give access to the user to upload the file to S3. In second hit user will upload the file to that s3 pre-signed URL. After the user complete the upload he will send the request to process the file.
Do we have any security issues with step 3? As user can misuse the pre-signed URL with malicious file.
Which of these method is best according to industry practice?
Details of each approach:
1. Uploading using OCTET-STREAM file type
Pros:
This method is good to upload file types which can be opened in some application such as xlsx.
1 API hit. Direct file upload
Cons:
This option is not suitable to upload multiple files. If in future we need to support multiple file upload this should be changed to multipart/form-data (A2).
No metadata can be send as body parameter. Metadata can be send in headers.
2. Upload the file using form-data request
User will upload the file with the API request by attaching it as multipart form.
Pros
We can send multiple files at the same time.
We can send extra parameters in the body.
3. Upload the file using the pre-signed URL
Cons
Customer have to hit the 3 APIs to upload the file. (2 API hits to upload and then 1 more API hit to check the process the file)

If you want them to load data into a bucket, the best way will almost always be the pre-signed URL. This gives you complete control over how you hand out access to the bucket, but also allows them to directly upload into the bucket when they have the access.
In the first two examples the user can send malicious data to your API, potentially DOSing the server / incurring costs on you to manage the payloads as you have no control over access (it is public).
In the third case they can request a URL from you, but that is it, other than spamming you for requests for URLs, unless you grant them a URL they can't access the bucket or do anything else. This seems much better than spamming your upload with large junk files and having you process them before you decide you didn't want them anyway.
Finally using the pre-signed URL is the pattern AWS would expect you to use, and so have a lot of support for managing the access, roles, logging and monitoring etc that you would want to put around this service. When you are standing up the API yourself this will all be up to you to manage.

Related

Static Website, File Upload and recaptcha

I am just thinking what the best approach is to implement a simple form with file upload on a static website without any backend.
Scenario:
I have static website (NuxtJS) where a form can be filled and files can be uploaded.
To protect this form I wanted to use recaptcha by Google but as I read a little further in their documentation it seems that I need a backend which is a overkill for a static website.
Furthermore I wanted to support file upload... quite complicated without a backend.
What I thought of:
Maybe an existing product which does exactly what I am looking for? Or should I build a AWS Lambda Pipeline (of course with an S3 Bucket) to function as my "backend" for recaptcha and file upload.
Is there any approach which makes this scenario simpler, or am I thinking to complicated at the moment.
Use Case / Flow Chart:
Users enters Website.
Fills out form.
(optional) uploads files
Checks recaptcha
Clicks Send - Sends "Message" in our companies slack channel / or email.
However I solved this "common" task with a custom "backend" hosted on AWS Lambda which makes the whole stuff "serverless".
For those who are interested in "how to setup a server less backend" here's the current flow-chart which I made use of.
As you can see after the recaptcha is validated on client side and a token is generated, it is sent to the AWS API Gateway which triggers a Lambda Function (NodeJS Implementation of a Backend) where the token is validated and for file uploads pre-signed Uris are generated.
Notice: The API Gateway and the S3 Bucket need a valid CORS Configuration to communicate with each other and the world.

GCS Signed Urls with subfolder in bucket

I have a bucket with a sub-folder structure to add media
e.g.
bucket/Org1/ ...
bucket/Org2/ ...
and I want to generate a signed url for all the media inside each subfolder, so users that belongs to organization 1 only can view they files.
Of course I don't want to generate a signed url for each file (can be a lot) and also ACL doesn't work, because my users are logged with a non-google account (and can haven't)
so there is any way to allow like bucket/Org1/* ?
Unfortunately, no. For retrieving objects, signed URLs need to be for exact objects. You'd need to generate one per object.
One way to accomplish this would be to write a small App Engine app that they attempt to download from instead of directly from GCS which would check authentication according to whatever mechanism you're using and then, if they pass, generate a signed URL for that resource and redirect the user.

Google Cloud Storage Resumable Upload - Is uploadid Resumable

I am trying to figure out which is better: resumable upload or signed url. For upload only. Does anyone know if one uploadid can be used by multiple uploads? Or how can a user upload multiple files using one uploadid?
If your goal is to allow users without Google credentials to upload objects, signed URLs are your best bet. This is their intended purpose.
You can use uploadIds to accomplish the same goal, but they are much less featureful in this regard. For example, they do not support setting expiration times, and the server must set all parameters other than the data itself.

How to upload Files to Cloud Storage?

I have a Google Cloud Endpoints wich is using Cloud SQL to store data. I want to provide a file upload for Clients and the files should be stored in Cloud Storage but I also want to store file meta data and the file storage url in Cloud SQL.
What's the best was to do this?
Can I upload files through cloud endpoints or do I need an extra upload Servlet?
How can I update my database entities which needs a reference to the uploaded files.
Any examples on how to combine those 3 technologies?
Assuming your clients are not added to your google cloud project (which is typically the case), your users don't have write access to your GCS bucket. You can either submit files to your application and move to GCS from there (not recommended as consumes more network and CPU) or a better way is to submit to GCS directly.
To let the client write to your GCS bucket directly, you will need to either:
1. put your access key on client for write access (not recommended), if the client is used by limited trusted people.
2. generate a time-bound token and put it on the client as signed URL to upload directly.
Endpoints APIs themselves cannot do this, but you can generate the signed GCS URL at the server and get it using endpoints on client. then set it as form action (on web client, other clients have similar ways for signed upload) and submit the form to upload the file.
<form action="SIGNED_URL_FROM_ENDPOINTS" method="post" enctype="multipart/form-data">
I don't see an open-source code out there doing exactly this, but closest is this project that does generate the signed URL with a time-out (the only unintuitive part).
Best way to update the metadata in your database is to watch GCS bucket using 'Object Change Notifications'. Another way is to send the metadata to your server from client itself, which can be an endpoints call. You can also use a mix of both where the metadata goes to server using endpoints even before the the file is uploaded and the notification updates the record with confirmation that it is available to serve.

Best practices to redirect a HTTP POST to my REST API towards my S3 bucket?

Say we want a REST API to support file uploads, and we want uploads to be done directly on S3.
According to this solution Amazon S3 direct file upload from client browser - private key disclosure, we have to create POLICY and SIGNATURE for user to be allowed to upload to S3.
However, we want a single entry point for the API, including uploads.
Can we:
1. in our API, catch POST https://www.example.org/users/1234/objects
2. calculate POLICY and SIGNATURE to allow direct upload to S3
3. return a 307 "Temporary Redirect" to https://s3-bucket.s3.amazonaws.com
How to pass POLICY and SIGNATURE in the redirect?
What is best practice here?
You dont redirect, instead your API should return the policy and signature in the response (say in JSON).
Then the browser can use these values to directly upload to S3 as in the document. This is a two step process.