I am currently making a social sharing like app and I encounter a problem.
First off, S3 in my experience is slow, so I need to sync the data for multiple servers around the world to make it faster for multiple users.
So my question is, I need to create multiple buckets for each country right? Amazon has a list of their server locations. So for each user, I calculate the nearest server than upload there? How?
Next question, in my app people can subscribe to others and check for their updates. So realistically, this would not create a speed difference. If someone in Singapore uploaded a piece of text and has a subscriber in United States, it wouldn't be any quicker for this subscriber because he has to download a piece of text stored all the way in the Singapore.
All of this is making me confused! I personally find S3 very slow, which is why I am using CloudFront.
Any help? Am I misunderstanding the process? Thanks!
Buckets are not per country, they are per region (EU, US, Asia, etc.)
Secondly, you do not have to manage closest URL to your S3 buckets, that's what CloudFront is for, you just get a single URL for each bucket and CloudFront will manage routing the user's request to the closest edge location.
PS: In addition, Amazon replicates data uploaded to your bucket across all edge locations transparently.
Amazon in no way "automatically" replicates your content out to the edge locations. Instead, your content is copied to a single edge location, if (and only) if the content is not there (could be the first pull, could be it's expired) when a user tries to access it from that edge. It is a pull mechanism, not a push. See "Download Distributions for HTTP Delivery" section of http://aws.amazon.com/cloudfront/
Related
My company have an application which could be installed with Qt Online-Installers. The data are stored on the our personal server, but, with time, we found out, that the internet connection is a bit slow for users on the other edge of the world. So, there is a question - "What services are we able to use to store these data, which are designed for these purposes?". When I was investigating this question I found the Information about the thing which is called "Content Delivery Network", but I'm not sure if it's something fits or not.
Unfortunately, I don't have enough experience in this area, so, maybe somebody knows more and could give me an advice. Thank you!
Cloudfront on AWS . Depends on what your content is but can probably store it on s3 and then use Cloudfront to cache it at edge locations across the globe.
Your research led you to the right topic because it sounds like you could benefit from a CDN. CDNs store cached versions of your website, download files, video, etc. on their servers which is often a distributed network of servers across the globe, known as 'Points of Presence' (PoPs). When a user requests a file from your website, assuming it is leveraging a CDN, the user request actually goes to the closest POP and retrieves the file. This improves performance because the user may be very far from your origin server, or your origin server may not have enough resources to answer every request by itself.
The amount of time a CDN caches objects from your site depends on configurable settings. You can inform the CDN on how to cache objects using HTTP cache headers. Here is an intro video from Akamai, the largest CDN, with some helpful explanation of HTTP caching headers.
https://www.youtube.com/watch?v=zAxSE1M4yKE
Cheers.
So I'm making an app that involves streaming audio(radio-like) from the Google Cloud Storage and was looking into the costs. It seems it would be much too expensive as is.
e.g. Lets say I have 10MB audio files, a user listens to 20 files a day and I have 2000 active users. That's 400GBs or $48/day. i.e. ~$1440/month just for that.
I then looked into putting a CDN in front of it, to minimize direct reads from the Storage. Now initially that made sense to me. The CDN would cache the audio files and the clients would be getting the files from the cache most of the time. However, as I was looking at Fastly's pricing (Fastly is a Google partner and seems like a good fit) I noticed that they seem to be pricing bandwidth usage to their cache at the exact same rate as Google cloud does ($0.12/GB). So unless I'm reading this wrong, putting up the CDN would not save me ANY money. Now I get that there are other reasons why putting a CDN in front of it could be a good idea, but am I really reading this right?
Also, if you have any other tips on how I should set this up, I'm all ears.
Estimating the invoice of such a service is a complex matter. To get an informed answer and tips regarding possible implementation paths I would suggest reaching out to a GCP Sales representative. Similarly you should contact the Fastly team to get a precise picture of their pricing model.
Moreover, any estimate we could make here would be outdated as soon as any of the respective pricing model changes, which would invalidate the answer and probably drive future readers to wrong conclusions.
I am asking for advice on possibly better solutions for the part of the project I'm working on. I'll first give some background and then my current thoughts.
Background
Our clients can use my company's products to generate potentially large data sets for use in their industry. When the data sets are generated, the clients will file a processing request to us.
We want to send the clients a summary email which contains some statistical charts as well as sampling points from the data sets so they can do some initial quality control work. If the data sets are of bad quality, they don't need to file any request.
One problem is that the charts and sampling points can be potentially too large to be sent in an email. The charts and the sampling points we want to include in the emails are pictures. Although we can use low-quality format such as JPEG to save space, we cannot control how many data sets would be included in the summary email, so the total size could still exceed the normal email size limit.
In terms of technologies, we are mainly developing in Python on Ubuntu 14.04.
Goals of the Solution
In general, we want to present a report-like thing to the clients to do some initial QA. The report may contains external links but does not need to be very interactive. In other words, a static report should be fine.
We want to reduce the steps or things that our clients must do to read the report. For example, if the report can be just an email, the user only needs to 1). log in and 2). open the email. If they use a client software, they may skip 1). and just open and begin to read.
We also want to minimize the burden of maintaining extra user accounts for both us and our clients. For example, if the solution requires us to register a new user account, this solution is, although still acceptable, not ranked very high.
Security is important because our clients don't want their reports to be read by unauthorized third parties.
We want the process automated. We want the solution to provide programming interface so that we can automate the report sending/sharing process.
Performance is NOT a critical issue. Our user base is not large. I think at most in hundreds. They also don't generate data that frequently, at most once a week. We don't need real-time response. Even a delay of a few hours is still acceptable.
My Current Thoughts of Solution
Possible solution #1: In-house web service. I can set up a server machine and develop our own web service. We put the report into our database and the clients can then query via the Internet.
Possible solution #2: Amazon Web Service. AWS is quite mature but I'm not sure if they could be expensive because so far we just wanna share a report with our remote clients which doesn't look like a big deal to use AWS.
Possible solution #3: Google Drive. I know Google Drive provides API to do uploading and sharing programmatically, but I think we need to register a dedicated Google account to use that.
Any better solutions??
You could possibly use AWS S3 and Cloudfront. Files can easily be loaded into S3 using the AWS SDK's and API. You can then use the API to generate secure links to the files that can only be opened for a specific time and optionally from a specific IP.
Files on S3 can also be automatically cleaned up after a specific time if needed using lifecycle rules.
Storage and transfer prices are fairly cheap with AWS and remember that the S3 storage cost indicated is by the month so if you only have an object loaded for a few days then you only pay for a few days.
S3: http://aws.amazon.com/s3/pricing
Cloudfront: https://aws.amazon.com/cloudfront/pricing/
Here's a list of the SDK's for AWS:
https://aws.amazon.com/tools/#sdk
Or you can use their command line tools for Windows batch or powershell scripting:
https://aws.amazon.com/tools/#cli
Here's some info on how the private content urls are created:
http://docs.aws.amazon.com/AmazonCloudFront/latest/DeveloperGuide/PrivateContent.html
I will suggest to built this service using mix of your #1 and #2 options. You can do the processing and for transferring the data leverage AWS S3 which is quiet cheap.
Example: 100GB costs like approx $3.
Also AWS S3 will be beneficial as you are covered for any disaster on your local environment your data will be safe in S3.
For security you can leverage data encryption and signed URLS in AWS S3.
I have a website in which users would upload various and later access them.
The files are stored in a specific path in the server at this point. Now if I need to have multiple servers for the website, what is the best way to make the user uploaded files accessible across multiple servers. Amazon s3 is one option that has crossed my mind. What other options do I have?
First, you can try using a CDN (http://en.wikipedia.org/wiki/Content_delivery_network).
Also, you can make it in house, by having specialized servers setup for static content. You will need maybe a lookup server, to know for each file on what server can be found. It will also contain the logic to determine what is the best server to use to save the file. This is more complicated, as you will have to make the load balancing and take care of geographic location of users.
I'm interested in social networks and have stumbled upon something which makes me curious.
How does facebook keep people from playing with URLs and gaining access to photos they should not?
Let me expand, here's an altered example of a facebook image URL that came up on my feed-
https://fbcdn-sphotos-g-a.akamaihd.net/hphotos-ak-prn1/s480x480/{five_digit_number}_{twelve_digit_number}_{ten_digit_number}_n.jpg
So, those with more web application experience will presumably know the answer to this, I suspect it's well understood, but what is to stop me from changing the numbers and seeing other people's photos that possibly I'm not supposed to?
[I understand that this doesn't work, I'm just trying to understand how they maintain security and avoid this problem]
Many thanks in advance,
Nick
There's a couple ways you can achieve it.
The first is link to a script or action that authenticates the request, and then returns an image. You can find an example with ASP.NET MVC here. The downside is it's pretty inefficient, and you run the risk of twice the bandwidth for each request (once so your server can grab the image from wherever it's stored, and once to serve it to your users).
The second option, you can do like Facebook and just generate obscure url's for each photo. As Thomas said in his comment, you're not going to guess a 27 digit number.
The third option I think is the best, especially if you're using something like Microsoft Azure or Amazon S3. Azure Blob Storage supports Shared Access Signatures, which let's you generate temporary url's for private files. These can be set to expire in a few minutes, or last a lifetime. The files are served directly to the user, and there's no risk if the url leaks after the expiration period.
Amazon S3 has something similar with Query String Authentication.
Ultimately, you need to figure out your threat model, and make a decision weighing the pros and cons of each approach. On Facebook, these are images that have presumably been shared with hundreds of friends. There's a significantly lower expectation of privacy, and so maybe authenticating every request is overkill. A random, hard to guess URL is probably sufficient, and let's them serve data through their CDN, and minimizes the amount of processing per request. With Option 3, you're still going to have overhead of generating those signed URL's.