Why do I get different urls for the same content? - facebook

I am debugging a session of streaming of Facebook Live and I have an url of .m4v type which source is https://video-mad1-1.xx.fbcdn.net. With other connection in other place I have this url: https://video.fagp1-1.fna.fbcdn.net.
why do I have different urls for the same content?
It could be depend of the connection or the place (the distance between the two places are around 2 kilometers)?

Content distribution networks (CDNs) distribute servers all over, to help serve content as close to the user as possible. This reduces latency, generally increases throughput, and cuts down on traffic load going over longer haul links that tend to be more restricted in the bandwidth they have available.
When you put content up to a CDN node, it's generally distributed to at least some other places. Therefore, it's entirely possible to have different URLs to the same content.
You might also find this interesting: https://anuragbhatia.com/2018/03/networking/isp-column/mapping-facebooks-fna-cdn-nodes-across-the-world/

Related

How to reduce the latency in case of Indexing files using Lucene.net

Hi We are planning to use lucene.net as apart of one of our products which is mainly a content repository. In order to have better performance, our system will be using a Lucene query engine for majority of content read operation. But one of the major set back that we feel is the latency, as we are unaware of clustered or distributed Lucene Implementation, which is the best method to reduce the latency.
Based on the comments I now understand that the latency concerns relate to IP packet latency of having a single index server that is on the other side of the planet from the user.
Lucene.net, like Lucene, is a library not a platform. It provides blazingly fast indexing and querying, but does not provide higher level platform features like a multi-server distributed index. Lucene does contain alot of plumbing to support such use cases and it's this plumbing that software products can build on top of to provide multi-server distributed indexes. The two most popular software platforms built on Lucene that provide such features are Apache Solr and ElasticSearch. It's important to note however that they are built on top of the Java version of Lucene rather than Lucene.net. I am not aware similar alternatives for Lucene.net, and building your own would be a major undertaking.
It is possible however that the IP packet latency you are concerned about may not turn out to be a large issue, let me explain. It's often possible to send an IP request to the other side of the planet in under 300ms, certainly under 500ms. When rendering a web page served from the other side of the planet with lots of images served from there as well, this latency can add up quickly. At 300ms per image if the page had 20 images on it plus a few css files and a few js files all loaded from that server, then rendering the page could require say 25 requests. At 300ms latency per request, such a situation would bear 7500ms in latency, or 7.5 secs. In reality some of the requests could be served in parallel, which would reduce the effective latency, but on top of this latency the sever has to do work to meet the request and that takes time too. Anyway, this example is just for illustration purposes to agree that serving a web page from the other side of the world does raise IP latency concerns.
However, making a search request from Lucene from the other side of the planet does not have similar issues because it need only be a single request. So for example, if a person needs to issue a search to a API server on the other side of the planet running Lucene.net, the request (probably json) is a single request that bares the IP latency, say 300ms. That request is serviced by Lucene.net and the results are returned as a single response to that request and then the App or web page that made the request renders the results. So the total IP latency is just the IP latency of a single request which for many use cases would be fine done across the globe.

Is there standard way of making multiple API calls combined into one HTTP request?

While designing rest API's I time to time have challenge to deal with batch operations (e.g. delete or update many entities at once) to reduce overhead of many tcp client connections. And in particular situation problem usually solves by adding custom api method for specific operation (e.g. POST /files/batchDelete which accepts ids at request body) which doesn't look pretty from point of view of rest api design principles but do the job.
But for me general solution for the problem still desirable. Recently I found Google Cloud Storage JSON API batching documentation which for me looks like pretty general solution. I mean similar format may be used for any http api, not just google cloud storage. So my question is - does anybody know kind of general standard (standard or it's draft, guideline, community effort or so) of making multiple API calls combined into one HTTP request?
I'm aware of capabilities of http/2 which include usage of single tcp connection for http requests but my question is addressed to application level. Which in my opinion still make sense because despite of ability to use http/2 taking that on application level seems like the only way to guarantee that for any client including http/1 which is currently the most used version of http.
TL;DR
REST nor HTTP are ideal for batch operations.
Usually caching, which is one of RESTs constraints, which is not optional but mandatory, prevents batch processing in some form.
It might be beneficial to not expose the data to update or remove in batch as own resources but as data elements within a single resource, like a data table in a HTML page. Here updating or removing all or parts of the entries should be straight forward.
If the system in general is write-intensive it is probably better to think of other solutions such as exposing the DB directly to those clients to spare a further level of indirection and complexity.
Utilization of caching may prevent a lot of workload on the server and even spare unnecessary connecctions
To start with, REST nor HTTP are ideal for batch operations. As Jim Webber pointed out the application domain of HTTP is the transfer of documents over the Web. This is what HTTP does and this is what it is good at. However, any business rules we conclude are just a side effect of the document management and we have to come up with solutions to turn this document management side effects to something useful.
As REST is just a generalization of the concepts used in the browsable Web, it is no miracle that the same concepts that apply to Web development also apply to REST development in some form. Thereby a question like how something should be done in REST usually resolves around answering how something should be done on the Web.
As mentioned before, HTTP isn't ideal in terms of batch processing actions. Sure, a GET request may retrieve multiple results, though in reality you obtain one response containing links to further resources. The creation of resources has, according to the HTTP specification, to be indicated with a Location header that points to the newly created resource. POST is defined as an all purpose method that allows to perform tasks according to server-specific semantics. So you could basically use it to create multiple resources at once. However, the HTTP spec clearly lacks support for indicating the creation of multiple resources at once as the Location header may only appear once per response as well as define only one URI in it. So how can a server indicate the creation of multiple resources to the server?
A further indication that HTTP isn't ideal for batch processing is that a URI must reference a single resource. That resource may change over time, though the URI can't ever point to multiple resources at once. The URI itself is, more or less, used as key by caches which store a cacheable response representation for that URI. As a URI may only ever reference one single resource, a cache will also only ever store the representation of one resource for that URI. A cache will invalidate a stored representation for a URI if an unsafe operation is performed on that URI. In case of a DELETE operation, which is by nature unsafe, the representation for the URI the DELETE is performed on will be removed. If you now "redirect" the DELETE operation to remove multiple backing resources at once, how should a cache take notice of that? It only operates on the URI invoked. Hence even when you delete multiple resources in one go via DELETE a cache might still serve clients with outdated information as it simply didn't take notice of the removal yet and its freshness value would still indicate a fresh-enough state. Unless you disable caching by default, which somehow violates one of REST's constraints, or reduce the time period a representation is considered fresh enough to a very low value, clients will probably get served with outdated information. You could of course perform an unsafe operation on each of these URIs then to "clear" the cache, though in that case you could have invoked the DELETE operation on each resource you wanted to batch delete itself to start with.
It gets a bit easier though if the batch of data you want to remove is not explicitly captured via their own resources but as data of a single resource. Think of a data-table on a Web page where you have certain form-elements, such as a checkbox you can click on to mark an entry as delete candidate and then after invoking the submit button send the respective selected elements to the server which performs the removal of these items. Here only the state of one resource is updated and thus a simple POST, PUT or even PATCH operation can be performed on that resource URI. This also goes well with caching as outlined before as only one resource has to be altered, which through the usage of unsafe operations on that URI will automatically lead to an invalidation of any stored representation for the given URI.
The above mentioned usage of form-elements to mark certain elements for removal depends however on the media-type issued. In the case of HTML its forms section specifies the available components and their affordances. An affordance is the knowledge what you can and should do with certain objects. I.e. a button or link may want to be pushed, a text field may expect numeric or alphanumeric input which further may be length limited and so on. Other media types, such as hal-forms, halform or ion, attempt to provide form representations and components for a JSON based notation, however, support for such media-types is still quite limited.
As one of your concerns are the number of client connections to your service, I assume you have a write-intensive scenario as in read-intensive cases caching would probably take away a good chunk of load from your server. I.e. BBC once reported that they could reduce the load on their servers drastically just by introducing a one minute caching interval for recently requested resources. This mainly affected their start page and the linked articles as people clicked on the latest news more often than on old news. On receiving a couple of thousands, if not hundred thousands, request per minute they could, as mentioned before, reduce the number of requests actually reaching the server significantly and therefore take away a huge load on their servers.
Write intensive use-cases however can't take benefit of caching as much as read-intensive cases as the cache would get invalidated quite often and the actual request being forward to the server for processing. If the API is more or less used to perform CRUD operations, as so many "REST" APIs do in reality, it is questionable if it wouldn't be preferable to expose the database directly to the clients. Almost all modern database vendors ship with sophisticated user-right management options and allow to create views that can be exposed to certain users. The "REST API" on top of it basically just adds a further level of indirection and complexity in such a case. By exposing the DB directly, performing batch updates or deletions shouldn't be an issue at all as through the respective query languages support for such operations should already be build into the DB layer.
In regards to the number of connections clients create: HTTP from 1.0 on allows the reusage of connections via the Connection: keep-alive header directive. In HTTP/1.1 persistent connections are used by default if not explicitly requested to close via the respective Connection: close header directive. HTTP/2 introduced full-duplex connections that allow many channels and therefore requests to reuse the same connections at the same time. This is more or less a fix for the connection limitation suggested in RFC 2626 which plenty of Web developers avoided by using CDN and similar stuff. Currently most implementations use a maximum limit of 100 channels and therefore simultaneous downloads via a single connections AFAIK.
Usually opening and closing a connection takes a bit of time and server resources and the more open connections a server has to deal with the more a system may suffer. Though open connections with hardly any traffic aren't a big issue for most servers. While the connection creation was usually considered to be the costly part, through the usage of persistent connections that factor moved now towards the number of requests issued, hence the request for sending out batch-requests, which HTTP is not really made for. Again, as mentioned throughout the post, through the smart utilization of caching plenty of requests may never reach the server at all, if possible. This is probably one of the best optimization strategies to reduce the number of simultaneous requests, as probably plenty of requests might never reach the server at all. Probably the best advice to give is in such a case to have a look at what kind of resources are requested frequently, which requests take up a lot of processing capacity and which ones can easily get responded with by utilizing caching options.
reduce overhead of many tcp client connections
If this is the crux of the issue, the easiest way to solve this is to switch to HTTP/2
In a way, HTTP/2 does exactly what you want. You open 1 connection, and using that collection you can send many HTTP requests in parallel. Unlike batching in a single HTTP request, it's mostly transparent for clients and response and requests can be processed out of order.
Ultimately batching multiple operations in a single HTTP request is always a network hack.
HTTP/2 is widely available. If HTTP/1.1 is still the most used version (this might be true, but gap is closing), this has more to do with servers not yet being set up for it, not clients.

How can I improve response time if the remote server is located very far physical distance

I want to know how to construct servers physically in this situation.
Let's assume that my service provides in the USA.
And my business is quite successful so, I want to expand my business location in Asia.
but I don't want to localized service, so I just got some API server in Asia to provide service which is just use API that located in headquater, but my main components are still in the USA.
But the problem is that my API which is located in Asia needs to call head-quater API which is located in the USA, and the response is quite often slow because of far physical distance.
so In this situation, How can I overcome?
In my opinion, I get some CDN for static contents. but I have no idea how to improve the API response time problem which is originated from physical distance.
If it is a stupid question, please understand, I'm quite a newbie in architect.
EDIT:
Also, How can I construct database replication in this situation.
If I get a replication which is replicate from the USA in Asia, I think the replication performance is quite poor because of phisical distance.
How Amazon or any global service construct it?
Replication performance can be quite poor. It is important to understand how much of your data is changing so that you can estimate the bandwidth required and understand whether your replication can keep up.
Amazon and other global services deal with this via a combination of replication, edge-caching (CDN), and other methodologies that bring the data closer to the consumer.
As a first step, you also might want to look at just making your API more coarse-grained. The fewer calls you have to make, the higher the performance (as the problem is likely latency, not bandwidth). See if you can batch things up instead of handling them one-at-a-time.
You also can look critically at caching. Instead of making your read-only API calls all the time, introduce some cache-control headers to specify the acceptable age of your requests. A lot of data is very static, things like user data, departments, product-info etc... Some of this data can leverage caching layers to become much more performant.
If you want to use AWS and want to host main components in a specific region, then you may think of hosting it yourself in EC2(s) [as Origin Server] in the region of your choice and use Cloudfront (CDN) to serve the content globally. AWS employs their own High Speed Backbone Network to reduce latency between geographically distant locations, by reducing no of Network hops.
From a caching standpoint, as Rob rightly said, Cloudfront performs different caching mechanisms for hot objects, warm objects (edge-caching, regional-caching); Also the Origin servers can send minimum expiration time and maximum expiration time over HTTP Headers to define Caching TTL.
If however, you don't want to use the advantage of High Speed Backbone Network, you should consider application design of your endpoints and functionality keeping latency as a constraint; and use appropriate TTL for caching of objects and define appropriate caching strategy, keeping in mind the R/W ratio of your application.

Multimedia Streaming: Users connecting with multiple tabs

We're writing a streaming service using wowza as our streaming server with a flash applet on the client. A recent concern we've had is a single client accessing the same stream from multiple tabs. In this scenario, we want to save bandwidth by sending the client only one copy of the stream.
How is this problem typically handled?
I think this is not possible for multiple technical reasons. This is my personal opinion, so experts can correct me if I am wrong)
The flash players on multiple tabs are not aware of each other and they can't be associated with each other on Wowza Streaming Engine side either. Wowza would see 2 different clients from the same IP, but that might be two different devices behind the same router.
The browser tabs are usually isolated from each other in the browser implementations for security and stability reasons, not allowing cross-browser data sharing
The 2 players are not in perfect sync, started at different points in time and lags, buffering might also introduce time shift
I haven't seen any attempt to reduce bandwidth in this scenario. There are solutions to prevent users from playing multiple streams in parallel though, which I would not like to link here, but do a search for "wowza forbidding concurrent stream" and you'll find it.
I think what you thought of is multicast streaming, where network packets are broadcast over the network and every player receives the same stream. This is used in IP TV systems, but that is not possible over internet, only on intranet.

Horizontal scalability for distributed apps, how to achieve that?

I would like to disregard web applications here, because to scale them horizontally, ie to use multiple server instances together, it is "sufficient" to just duplicate the server software over the machines and just use a sort of router that forwards requests to the "less busy" server machine.
But what if my server application allows users to engage together in realtime ?
If the response to the request of a certain client X depends on the context of a client Y whose connection is managed by another machine then "inter machines" communication is needed.
I'd like to know the kind of "design solutions" that people has used in such cases.
For example, the people at Facebook must have already encountered such situation when enabling the chat feature of their social app.
Thank you in advance for any advise.
One solution to achive that is to use distibuted caches like memcache (Facebook also uses that aproach).
Then all the information which is needed on all nodes is stored in that cache (and a database if it needs to be permanent) an so all nodes can access that information (with a very small latency between the nodes).
regards
You should consider some solutions that provide transparent horizontal database scalability and guarantee ACID semantics. There are many solutions that offer this at various levels. People at Facebook which you reference have solved the problem by accepting eventual consistency but your question leads me to believe that you can't accept eventual consistency.