What do content distribution network service providers do? - facebook

What is the purpose of CDN service providers?
My guess is that large scale sites like facebook,wikipedia,youtube etc use CDN service providers for some kind of outsourcing.
My understanding:
youtube keeps its content in these
CDNs and the site actually focus on
algorithms such as searching of
videos,suggesting related videos,
keeping subscriber list/playlist of
users etc.
The youtube site only keeps
meta-data,indexes?. or may be it also
contains one copy of its entire
content?. The user connects to
youtube site, searches for a video.
The site finds out the file name and
sends it to the CDN hub along with IP
address of the user.
The CDN hub than perhaps locates the
CDN node closest to the user and
serves the content to the user.
What is the advantage of this approach?
One most important I can see is that esp for videos, it is perhaps remarkably much more faster if you are streaming video from the same country than from across the globe.
but does distance really matter that much? Any concrete numbers to get a sense of increase in speed between getting videos from across the globe than from same country?
and Google doesn't want to install its storage nodes all over the world. It would rather outsource this to CDN service providers which have already spread their nodes all over the world. and Google only focuses on algorithms part (which it mostly keeps secret)
Is my understanding of the picture correct? Any input/pointers would be highly useful.
Thanks,

I learned about the importance of CDNs in terms of website performance a couple of years back thanks to Yahoo's "Best Practices for Speeding Up Your Web Site"
This is oft-referenced in ySlow, and Yahoo estimated a 20% speed increase.
Another "benefit" is parallel downloading, which is discussed at length by one of the above authors in this blog post.
These are some resources that I ran into when looking into site optimization so I just thought I'd share. Besides that, you seem to have a good grasp on the concept.

Related

Crawling facebook for art/statiatic purpose

I'm currently working of a project involving bots that crawl facebook to create a "neural map" of the connection between users and pages and i had some worries about legality.
Considering:
no information is saved or distributed from profiles
bots are polite (no spam, human-like delays between requests, etc)
the final outcome will be more of a visual representation of the
network
I've done a lot of researches around this problematic, but every question/answer revolves around saving data and making it publicly availlable while i do not save any data nor distribute any data.
Could i get in hot water for doing such project? Or at worst will i get asked to stop?
Thanks a lot!

Is it a bad idea to host a rest api on a cdn?

I'm new to server architecture and have been reading around a lot but have not yet had a solid opinion on if the setup below is good practice or not and was hoping someone with more experienced can give me confirmation if I'm setting up my architecture correctly:
Use Angular Universal to Pre render html to CDN (e.g. Cloudflare)
Cloudinary for Image assets
One/Few strong machines with ngix handling bus load and sending off to other servers listed below (all hosted in digital ocean):
Rest API (Express Server)
Database MongoDB
I'm really concerned about the speed of my rest api as the regions offered in digital ocean seem significantly smaller in contrast to a cdn like cloudflare. How much does this matter when affecting my speed and is a service?
I know this might sound ridiculous but the region issue makes me wonder if hosting a rest api express server on a cdn would be better than a place like digital ocean. (my instincts tell me I should't do this on a cdn but am at a loss for reasons and hope someone can provide clear reasons why I can or shouldn't host an express rest api server there.)
From my knowledge I would do this a little differently.
A CDN is used to serve content hence the name CDN (Content Delivery Network). The CDN its self doesn't serve the content but it routes the user to a server which serves it. For example if you have a server in the US, France and Asia and you where from the UK and requested the website with images hosted on these servers. The CDN would direct you the the closest/best server for you. In this case that would be the server in France.
So to answer your question it isn't a bad idea to host the RESTful API on the CDN but you would need multiple servers around the world (if you are going for worldwide) and use Cloudflare CDN to direct your traffic.
This is what I would do:
If your not expecting loads of traffic (like millions) just have 1-2 servers in each location so 1-2 in North America, South America, France (EU), Asia and maybe Australia. This will give you decent coverage. Then when you setup your CDN that should handle who goes where. Using node and nginx will help you a lot this will allow you to get cheaper not as powerful servers because they are pretty light weight.
Now for your databases you can do one of two things have one dedicated solution somewhere which will be as little latency for all regions somewhere like France (EU) so its more central or you can have multiple and have them sync. Having multiple databases which sync will be more work and will require quite a bit of research. Having the one server is a lot easier to manage.
The database will be your biggest problem deciding whether to do with one and deal with latency or multiple and have to manage them and keep them in sync. Keep in mind you could go with a cloud hosting platform to host your database this would help you with the issue because a lot of platforms will offer worldwide coverage as well as providing synchronised databases. You will however run into the cost issue when using cloud platforms.
Hope this answers your questions and provides you with the knowledge you need!

Video on demand website's server setup

I want to build middle scale anime-video watching website. For this purpose I am researching.
-I want to host my own videos, episodes of animes. I dont want to use some Russian video service.
-So for this purpose I want to understand how the media websites, cdns works together. I did lots of googling but it didnt come with an answer.
-For example, Should I buy two dedicated server and one for website and one for storage. And how should these servers communicate each other?
-Solutions like Amazon AWS, Wowza are not for my budget. I am searching low costs solution.
I advise you to use at least 2 servers, 1 for streaming/Storage and other for the Web stuff, the streaming solution use a lot of resources that could hang the server and if you keep your website on the same you will be completely off line when this happen.
For streaming you can use free software like ffmpeg, nginx (with RTMP module) or VLC,you could build a great platform with these softwares but the learning curve is slow.
For me Wowza is the best, but you need to pay $65/month per server.
On the web side, to show the videos I recommend to use flowplayer because it use html5 and flash fallback, is very easy to implement and is free.
Well, you have a lot work to do, best luck.

High traffic site. >10 million user a day. VPS or dedicated server?

We're launching an iPhone app soon, and if everything goes well, we might reach up to tens of millions of user each day.
What server solution would you use for this? I guess a small VPS isn't enough. Is dedicated server a better choice? Is there any good hosting provider that can provide such servers?
I'm a newbie when It comes to servers, and would like some basic info about how to handle this.
Thanks in advance
Unfortunately, you are not really going to know the apps requirements until the app is launched. It all depends on how much the app needs to communicate with the server, and how often users are using the app. Depending on those variables and even more, a VPS might be enough, or you may need a dedicated box, or several. It also depends a lot on the performance of the VPS and dedicated boxes, furthermore it depends on how much access to the system you need.
Ultimately, it seems you may not even know how well the app is going to do, so I suggest you take the cheap/efficient route of using cloud computing. That way you will limit your expenses initially when you app has a small user base. Then your performance can amp up as quickly as your app requires (of course so will the price). That is the benefit of cloud computing, you will not be losing money in the beginning until you have the user base to use your server to its limit. Furthermore, you do not have downtime, etc when/if your server is no longer enough.
Check out Google's Cloud Computing to get a hint of what is possible. I personally like Google's cloud experience, but you have many more options with varying degrees of freedom that you will have to check out. Amazon of course is another possibility.

Webservice standards and DTDs

While brainstorming about six years ago, I had what I thought was a great idea: in the future there could be webservice standards and DTDs that effectively turn the web into a decentralized knowledgebase. I listed several areas where I thought this could be applied, one of which was:
For making data avail. directly from a business's website: open hours, locations, and contact phone numbers. Suggest a web service standard by which businesses have a standard URL extended off the main (base) URL for there website, at which is located a webservice. That webservice as well has a standardized set of services for downloading a list of their locations, contact telephone numbers, and business hours.
It's interesting looking back at these notes now since this is not how things have evolved. Instead of businesses putting this information on only their website then letting any search engine or other data aggregator to crawl it, they are updating it separately on their website, their Facebook page, and Google Maps. Facebook and Google Maps, due to their popularity, have become the solution to the problem I though my idea would solve.
Is the way things are better than the way I thought they could be? If so then why doesn't my idea fit the reality? If not then what's holding my idea back from being realized?
A lot of this information is available via APIs, that doesn't mean that it doesn't get put other places as well, through a variety of means. For example, a company may expose information via an API, and their Facebook app might use that API to populate a Facebook page.
Also, various microformats are in use that encapsulate some of this information.
The biggest obstacle is agreeing on what meta-information should be exposed, how it should be exposed, and how it should be accessed.