Sending >100 meg files to an API

Sending >100 meg files to an API - rest

We have a REST service where clients send us batch of files. These files can be quite large >100mb. We had assumed that we could do this via a REST API, but I was wonder what is considered the best practice here? We are not tied to using REST.
Is there a limitation on IIS and REST that would make receiving many large files untenable? Where can things go wrong?
Is there a best practice for REST for doing this sort of thing?
Are there alternatives outside of REST that are considered better?

Take a look at this document.
https://learn.microsoft.com/en-us/dotnet/framework/wcf/feature-details/large-data-and-streaming
MaxAllowedContentLength property of IIS setting and MaxReceivedMessageSize in binding setting might be taken to consideration when transferring large data.
In short, for Intranet transfer of large files, the preferred use of Nettcpbinding in WCF provides a good performance experience, and for Internet transfer of large files, the use of async streaming may be a good idea. In addition, I don’t think that restful services have better performance than soap web services on transferring large data.

Related

Efficient architecture/tools for implementing async web API

Consider an event-driven microservice based web application that should have some async web APIs. AFAIK the suggested way to achieve async http request/response is to respond each API call with a say 202 Accepted status code and a location header to let caller retrieve the results later.
In this way, we have to generate a unique ID (like uuid or guid) for each request and store the id and all related events in the future in a persistent storage, so the API caller can track the progress of its request.
My question is how this API layer should be implemented considering we may have tens or hundreds of thousands of requests and responses per second. What is the most efficient architecture and tools to make such an API with this load?
One way could be storing all the requests and all related events in both database and a cache like redis (just for a certain limited time like 30 minutes).
Is there any better pattern/architecture/tools? How big companies and websites solved this issue?
Which database could be better for this scenario? (MongoDB, MySQL, …)
I really appreciate any useful answer specially if you have some production experience.

very valid question! In term of architecture or tools point of you should check out zipkin, which is an open distributed tracing system tried and tested by Twitter and especially if you have a microservice architecture, It is really useful to track-down all your request/response. It also includes Storage options include in-memory, JDBC (mysql), Cassandra, and Elasticsearch.
If you are using spring-boot for your microservices then it is easily pluggable.
Even if you are not totally convinced with Zipkin, architecture is worth looking into. From Production experience, I have used it and it was really useful.

Adapter Proxy for Restful APIs

this is a general 'what technologies are available' question.
My company provides a web application with a RESTful API. However, it is too slow for my needs and some of the results are in an awkward format.
I want to wrap their restful server with a proxy/adapter server, so when you connect to the proxy you get the RESTful API I wish the real one provides.
So it needs to do a few things:
passthrough most requests
cache some requests
do some extra requests on the original server to detect if a request is cacheable
for instance: there is a request for a field in a record: GET /records/id/field which might be slow, but there is a fingerprint request GET /records/id/fingerprint which is always fast. If there exists a cache of GET /records/1/field2 for the fingerprint feedbeef, then I need to check the original server still has the fingerprint feed beef before serving the cached version.
fix headers for some responses - e.g. content-type, based upon the path
do stream processing on some large content, for instance
GET /records/id/attachments/1234
returns a 100Mb log file in text format
remove null characters from files
optionally recode the log to filter out irrelevant lines, reducing the load on the client
cache the filtered version for later requests.
While I could modify the client to achieve this functionality, such code would not be re-usable for other clients (different languages), and complicates the client logic.
I had a look at whether clojure/ring could do it, and while there is a nice little proxy middleware for it, it doesn't handle streaming content as far as I can tell - the whole 100Mb would have to be downloaded. Also it doesn't include any cache logic yet.
I took a look at whether squid could do it, but I'm not familiar with the technology, and it seems mostly concerned with passing through requests rather than modifying them on the fly.
I'm looking for hints where I might find the correct technology to implement this. I'm mostly language agnostic if learning a new language gets me access to a really simple way to do it.

I believe you should choose a platform that is easier for you to implement your custom business logic on. The following web application frameworks provide easy connectivity with REST APIs, and allow you to create a web application that could work as a REST proxy:
Play framework (Java + Scala)
express + Node.js (Javascript)
Sinatra (Ruby)
I'm more familiar with Play, of which I know it provides utilities for caching you could find useful, and is also extendable by a number of plugins.
If you are familiar with Scala, you could have a also have a look at Finagle. It is a framework build be Twitter's infrastructure team to provide protocol-agnostic connectivity. It might be an overkill for REST to REST proxy, but it provides abstractions you might find useful.
You could also look at some 3rd party services like Apitools, which allows to create a proxy programmatically (in lua). Apirise is a similar service (of which I'm a co-founder) that intends to do provide similar functionalities with a user-friendly UI.

Beeceptor does exactly what you want. It plugs in-between your web-app and original API to route requests.
For your use-case of caching a few responses, you can create a rule. That way it shall not hit the original endpoint.
The requests to original APIs can be mocked, and you can inspect response
You can simulate delays.
(Note: it is a shameless plug, I am the author of Beeceptor and thought it should help you and other developers.)

https://github.com/nodejitsu/node-http-proxy is looking useful - although I don't yet know if it can stream process for transcoding.

Is REST suitable for web applications?

As I read about the REST interface, i came across a sentence of the developer that says:
The REST interface is designed to be efficient for large-grain hypermedia data transfer, optimizing for the common case of the Web, but resulting in an interface that is not optimal for other forms of architectural interaction.
Source: http://www.ics.uci.edu/~fielding/pubs/dissertation/rest_arch_style.htm
What are large-grain hypermedia transfers?
Normally a website consists out of multiple small files: HTML, CSS, JS and maybe some multimedia files like videos.
So is REST suitable for standard web applications or just good for thinks like transferring videos in Megabyte size to a client?

REST is an architectural paradigm and about accessing resources via URI. It is up to you what a resource should be. It could be a html, or it could be audio/video, or .pdf and so on.
And it is the job of the client to handle those resources, the webservice just says "here is it".

'large-grain' is not the same as 'large':
not having a fine texture; "coarse-grained wood"; "large-grained sand" [syn: coarse-grained]
http://dictionary.reference.com/browse/large-grained
REST might not be the optiomal solution for scenarions that are better modeled with messages, for example.
BTW: Where does your quote come from?

REST is used for any data transfer ranging in size from a simple form submission and upwards. So yes, it is suitable for standard web applications.
Today, the architecture where it is not optimal, is really for transferring data chunks that are smaller than a transaction. For example if a user ticks several check-boxes on a page, current designers will try to initiate just one REST data transfer, not several.

Auto completion of a field, design for server side scalability

When handling autocompletion feature for a form field where every character typed by a user triggers an api call for suggestions, how do you proxy this call to scale?
Direct from java script is not possible due to cross domain restrictions, and not secure because that would expose the api keys.
Moving this to the controller or model, would incur a lot of queries to the server side that would put heavy burden on them when the active user base has reached a certain limit.
Whats the standard industry practice for such a feature?

You'll need to be very smart on the client and on the server.
Use a lot of caching everywhere to avoid extra work. Use CORS or JSONP. And frankly speaking this is a lot of work. Not speaking of Lucene/SOLR being not very autocomplete capable engine.
Btw: look at www.rockitsearch.com . It has implementation autocomple with all the basic features. All you'll need to do is: register and export your data there. And then integrate your widget on your website.

Not sure what you mean by "proxy this call", but in general:
You can use JSONP for cross domain queries. But you pay performance penalty on a client side.
It's OK to query same domain. There is no single answer since topic is very generic. How you scale depends on your infrastructure. If application is designed to scale horizontally you scale just by adding more servers to your servers pool. Which is pretty simple using Amazon or Azure cloud services. It is also important to optimize database queries and indexes so that database responds fast. If user base is big you can even have multiple copies of the same databases to help with performance.
Don't worry about optimizations prematurely since you may never get to that point. If you get it is good problem to have and in this case solution is trivial.

REST vs. SOAP for large amount of data with Security

I have a requirement where I have large amount of data(in form of pdf,images,doc files) on server which will be distributed to many users. I want to pull these file using web services along with their meta-data. I will be getting the files in bytes. I am confused in which type of web service will be more secure, easy to parse? Which one is easy to implement on iPhone client?
I know REST is simpler but I read somewhere that it is not suitable for distributed environment. At the same time SOAP is too heavy for mobile platform.
I have searched many sites describing how REST is easier and how SOAP is secure. I got confused about which one to use?
Also about the kind of response, which will be better JSON or XML for my requirement?

For your requirements JSON will be the best kind of response because it is way smaller than XML (More than 50% smaller in many tests). You can use SBJSON (https://github.com/stig/json-framework/) to parse it easily on iOS.
Concerning REST or SOAP, the last one is indeed really heavy for mobile platform and not so easy to implement. SOAP requires XML too and cannot be used with JSON. Whereas with REST you can use JSON or XML and easily implement it on iOS with RESTKit (http://restkit.org/), for security you can use an SSL connection with HTTPS and a signed certificate.
The only advantage of SOAP is the WSDL (Webservice specification) which made your webservices really strong.

Unless you have a specific requirement to pull the file data and the metadata in the same response, you might consider just pulling down the file with a regular HTTP GET. You can get decent security with HTTPS and Basic Auth or client certificates. I would then include a Link header pointing to the metadata for your file, as in:
Link: </path/to/metadata>;rel=meta
In particular, this lets you have separate caching semantics for the file itself and for its metadata, which is useful in the common case where the files are much larger than their metadata and where metadata can change without file contents changing.

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse