Where I can store data for spreading them online? - server

My company have an application which could be installed with Qt Online-Installers. The data are stored on the our personal server, but, with time, we found out, that the internet connection is a bit slow for users on the other edge of the world. So, there is a question - "What services are we able to use to store these data, which are designed for these purposes?". When I was investigating this question I found the Information about the thing which is called "Content Delivery Network", but I'm not sure if it's something fits or not.
Unfortunately, I don't have enough experience in this area, so, maybe somebody knows more and could give me an advice. Thank you!

Cloudfront on AWS . Depends on what your content is but can probably store it on s3 and then use Cloudfront to cache it at edge locations across the globe.

Your research led you to the right topic because it sounds like you could benefit from a CDN. CDNs store cached versions of your website, download files, video, etc. on their servers which is often a distributed network of servers across the globe, known as 'Points of Presence' (PoPs). When a user requests a file from your website, assuming it is leveraging a CDN, the user request actually goes to the closest POP and retrieves the file. This improves performance because the user may be very far from your origin server, or your origin server may not have enough resources to answer every request by itself.
The amount of time a CDN caches objects from your site depends on configurable settings. You can inform the CDN on how to cache objects using HTTP cache headers. Here is an intro video from Akamai, the largest CDN, with some helpful explanation of HTTP caching headers.
https://www.youtube.com/watch?v=zAxSE1M4yKE
Cheers.

Related

Audio streaming from Google Cloud Storage and CDNs - costs

So I'm making an app that involves streaming audio(radio-like) from the Google Cloud Storage and was looking into the costs. It seems it would be much too expensive as is.
e.g. Lets say I have 10MB audio files, a user listens to 20 files a day and I have 2000 active users. That's 400GBs or $48/day. i.e. ~$1440/month just for that.
I then looked into putting a CDN in front of it, to minimize direct reads from the Storage. Now initially that made sense to me. The CDN would cache the audio files and the clients would be getting the files from the cache most of the time. However, as I was looking at Fastly's pricing (Fastly is a Google partner and seems like a good fit) I noticed that they seem to be pricing bandwidth usage to their cache at the exact same rate as Google cloud does ($0.12/GB). So unless I'm reading this wrong, putting up the CDN would not save me ANY money. Now I get that there are other reasons why putting a CDN in front of it could be a good idea, but am I really reading this right?
Also, if you have any other tips on how I should set this up, I'm all ears.
Estimating the invoice of such a service is a complex matter. To get an informed answer and tips regarding possible implementation paths I would suggest reaching out to a GCP Sales representative. Similarly you should contact the Fastly team to get a precise picture of their pricing model.
Moreover, any estimate we could make here would be outdated as soon as any of the respective pricing model changes, which would invalidate the answer and probably drive future readers to wrong conclusions.

CDN for a RESTful API?

I have a RESTful API with resources updates once a week. That is, for each resource, I update it once and week and allow clients to access it. It's an ever changing calculator.
There are probably 10,000 resources which could be requested.
Is it possible to put something like this behind a CDN? Traditionally CDNs are for undeniably static content, ie images. I'm not sure where my situation sits in the spectrum of dynamic <-> static.
Cheers
90% of the resources might not even get called, and if they are, will
get called a few times only. It wont be a mass of repetitive calls.
Right there in your comments, you just showed me that a CDN is not beneficial to you.
Usually how a CDN works is the first call it is downloaded from the main server to the regional CDN node then delivered to the client meaning the first GET will have no improvements. The following GETs to the same regional node will have the speed improvement. If you have little to no repetitive calls, then you will not see any noticeable improvement.
As I said in the comments, for small files, clients are probably spending as much time on the DNS lookup as they are on the download. Look into a global DNS solution (like Anycast) to reduce connection times. This is easy to setup and requires little to no maintenance.
I think it's entirely reasonable to put it behind a CDN if you think your content will reach the appropriate level of scale. As long as the cache-control headers are set such that the latest content is loaded when the cached version may be stale, you'll be fine.
The main benefit of CDNs comes when resources are requested from a variety of different sources, and so siteY.com can use the same cached version of a resource as siteX.com. Do you anticipate your resources will be requested from various different sources?

Using CouchDB as interface. Is it appropriate way?

our devices (microscopes with cameras) produce images and additional information to each image.
Now a middleware supplies wants to connect these devices to lab automation system. They have to acquire the data and we have to provide it. An astonishing thing for me was their interface suggestion - a very cryptical token separated format (ASTM E1394-97). Unfortunatelly, they even can't accomodate images in their protocol, and are aiming to get file-paths.
I thought it is not the up-to date approach. While lookink for alternatives, I saw CoachDB.
So, my idea was, our devices would import data including images in CoachDB and they could get the data. It seems even, that using mustache, we could produce the format they want (ascii-text) and placing URLs as image references instead of path's.
My question is, did someone applied CoachDB for such a use case already? It seems to be a little-bit misuse of CoachDB, as the main intention is interface not data storage. Another point disturbing me is, that the inventor of CoachDB went to other project Coachbase. Could it mean lack of support for CoachDB in the future?
Thank you very much for any insights and suggestions!
It's ok use-case and actually we're using CouchDB in such way - as proxing middleware between medical laboratory analyzers and LIS. Some of them publish images or pdf data on shared folders and we'd just loading them into related document as attachments.
More over you'd like to know, CouchDB is able to serve external processes (aka os_daemons) and take care about their lifespan: restarting if someone had terminated and starting right after you update config options through HTTP interface. This helps to setup ASTM client and server processes since this protocol is different from HTTP (which is native for CouchDB) which communicates with devices and creates documents as regular CouchDB clients. In same way you may setup daemons to monitor shared folders for specific files. And all this is just CouchDB with few "low bounded" plugins.

Amazon S3 + CloudFront Queries

I am currently making a social sharing like app and I encounter a problem.
First off, S3 in my experience is slow, so I need to sync the data for multiple servers around the world to make it faster for multiple users.
So my question is, I need to create multiple buckets for each country right? Amazon has a list of their server locations. So for each user, I calculate the nearest server than upload there? How?
Next question, in my app people can subscribe to others and check for their updates. So realistically, this would not create a speed difference. If someone in Singapore uploaded a piece of text and has a subscriber in United States, it wouldn't be any quicker for this subscriber because he has to download a piece of text stored all the way in the Singapore.
All of this is making me confused! I personally find S3 very slow, which is why I am using CloudFront.
Any help? Am I misunderstanding the process? Thanks!
Buckets are not per country, they are per region (EU, US, Asia, etc.)
Secondly, you do not have to manage closest URL to your S3 buckets, that's what CloudFront is for, you just get a single URL for each bucket and CloudFront will manage routing the user's request to the closest edge location.
PS: In addition, Amazon replicates data uploaded to your bucket across all edge locations transparently.
Amazon in no way "automatically" replicates your content out to the edge locations. Instead, your content is copied to a single edge location, if (and only) if the content is not there (could be the first pull, could be it's expired) when a user tries to access it from that edge. It is a pull mechanism, not a push. See "Download Distributions for HTTP Delivery" section of http://aws.amazon.com/cloudfront/

Facebook image URLs - how are they kept from un-authorised users?

I'm interested in social networks and have stumbled upon something which makes me curious.
How does facebook keep people from playing with URLs and gaining access to photos they should not?
Let me expand, here's an altered example of a facebook image URL that came up on my feed-
https://fbcdn-sphotos-g-a.akamaihd.net/hphotos-ak-prn1/s480x480/{five_digit_number}_{twelve_digit_number}_{ten_digit_number}_n.jpg
So, those with more web application experience will presumably know the answer to this, I suspect it's well understood, but what is to stop me from changing the numbers and seeing other people's photos that possibly I'm not supposed to?
[I understand that this doesn't work, I'm just trying to understand how they maintain security and avoid this problem]
Many thanks in advance,
Nick
There's a couple ways you can achieve it.
The first is link to a script or action that authenticates the request, and then returns an image. You can find an example with ASP.NET MVC here. The downside is it's pretty inefficient, and you run the risk of twice the bandwidth for each request (once so your server can grab the image from wherever it's stored, and once to serve it to your users).
The second option, you can do like Facebook and just generate obscure url's for each photo. As Thomas said in his comment, you're not going to guess a 27 digit number.
The third option I think is the best, especially if you're using something like Microsoft Azure or Amazon S3. Azure Blob Storage supports Shared Access Signatures, which let's you generate temporary url's for private files. These can be set to expire in a few minutes, or last a lifetime. The files are served directly to the user, and there's no risk if the url leaks after the expiration period.
Amazon S3 has something similar with Query String Authentication.
Ultimately, you need to figure out your threat model, and make a decision weighing the pros and cons of each approach. On Facebook, these are images that have presumably been shared with hundreds of friends. There's a significantly lower expectation of privacy, and so maybe authenticating every request is overkill. A random, hard to guess URL is probably sufficient, and let's them serve data through their CDN, and minimizes the amount of processing per request. With Option 3, you're still going to have overhead of generating those signed URL's.