Can I use Google Cloud Storage for Apache DocumentRoot? - google-cloud-storage

I was reading the docs and saw the following:
Standard Storage is appropriate for storing data that requires low latency access or data that is frequently accessed ("hot" objects), such as serving website content, interactive workloads, or data supporting mobile and gaming applications.
With that said, I wanted to know how would I go about mounting a gs://bucket? I would prefer to go this route than to create an NFS/GlusterFS.

You can use gcsfuse to mount a Google Cloud Storage bucket as a filesystem that Apache can read:
gcsfuse is a user-space file system for interacting with Google Cloud Storage.
As of 20 August 2015, the project's README also says:
Current status
Please treat gcsfuse as beta-quality software. Use it for whatever you like, but be aware that bugs may lurk, and that we reserve the right to make small backwards-incompatible changes.
The careful user should be sure to read semantics.md for information on how gcsfuse maps file system operations to GCS operations, and especially on surprising behaviors. The list of open issues may also be of interest.

Related

PostgresQL data_directory on Google Cloud Storage, possible?

I am new to google cloud and was wondering if it is possible to run PostgresQL container on Cloud Run but the data_directory of PostgresQL was pointed to Cloud Storage?
If possible, then please could you point me to some tutorials/guides on this topic. And also what are the downsides of this approach?
Edit-0: Just to clarify what I am trying to achieve:
I am learning google cloud and want to write simple application to work along with it. I have decided that the backend code will run as a container under Cloud Run and the persistent data(i.e the database file) will reside on Cloud Storage. Because this is a small app for learning purpose, I am trying to use as less moving parts as possible on the backend(and also ones that are always free). And also both PostgresQL and the backend code will reside in the same container except for the actual data file, which will reside under Cloud Storage. Is this approach correct? Are there better approaches to achieve the same minimalism?
Edit-1: Okay, I got the answer! The Google documentation here mentions the following:
"Don't run a database over Cloud Storage FUSE!"
The buckets are not meant to store database information, some of the limits are the following:
There is no limit to writes across multiple objects, which includes uploading, updating, and deleting objects. Buckets initially support roughly 1000 writes per second and then scale as needed.
There is no limit to reads of objects in a bucket, which includes reading object data, reading object metadata, and listing objects. Buckets initially support roughly 5000 object reads per second and then scale as needed.
One alternative to separate persistent disk for your PostgreSQL database, is to use Google Compute Engine. You can follow the “How to Set Up a New Persistent Disk for PostgreSQL Data” Community Tutorial.

Google Cloud Storage quota hit - how?

When my app is trying to access files in a bucket using a SignedURL, a 429 response is received:
<Error>
<Code>InsufficientQuota</Code>
<Message>
The App Engine application does not have enough quota.
</Message>
<Details>App s~[myappname] not have enough quota</Details>
</Error>
This error continues until the end of the day, when the quota is apparently reset, then I can use storage again. It's only a small app and does not have much usage. The project that contains the storage is set up to use billing. The files are being accessed from another project, which is also set up to use billing.
I'm not aware that Google Cloud Storage has any quotas that could be hit in this fashion. The only ones I know of are the ones here: https://cloud.google.com/storage/quotas but as far as I am aware, none of them apply.
Buckets are not being created or destroyed.
Updates are not being made to buckets.
There are only a couple of IAM identities.
There are no Pub/Sub notifications.
Objects stored in the buckets are small.
Is there any way I can find out why the quota is being exceeded?
It turns out it was because of a spending limit I had set on app engine. I didn't think those spending limits applied any more, but it turns out that's for new projects. Spending limits that have already been set on existing projects are effective, and I can personally attest that they do work!
Thanks for the comments #KevinQuinzel and #gso_gabriel.

Need advice: How to share a potentially large report to remote users?

I am asking for advice on possibly better solutions for the part of the project I'm working on. I'll first give some background and then my current thoughts.
Background
Our clients can use my company's products to generate potentially large data sets for use in their industry. When the data sets are generated, the clients will file a processing request to us.
We want to send the clients a summary email which contains some statistical charts as well as sampling points from the data sets so they can do some initial quality control work. If the data sets are of bad quality, they don't need to file any request.
One problem is that the charts and sampling points can be potentially too large to be sent in an email. The charts and the sampling points we want to include in the emails are pictures. Although we can use low-quality format such as JPEG to save space, we cannot control how many data sets would be included in the summary email, so the total size could still exceed the normal email size limit.
In terms of technologies, we are mainly developing in Python on Ubuntu 14.04.
Goals of the Solution
In general, we want to present a report-like thing to the clients to do some initial QA. The report may contains external links but does not need to be very interactive. In other words, a static report should be fine.
We want to reduce the steps or things that our clients must do to read the report. For example, if the report can be just an email, the user only needs to 1). log in and 2). open the email. If they use a client software, they may skip 1). and just open and begin to read.
We also want to minimize the burden of maintaining extra user accounts for both us and our clients. For example, if the solution requires us to register a new user account, this solution is, although still acceptable, not ranked very high.
Security is important because our clients don't want their reports to be read by unauthorized third parties.
We want the process automated. We want the solution to provide programming interface so that we can automate the report sending/sharing process.
Performance is NOT a critical issue. Our user base is not large. I think at most in hundreds. They also don't generate data that frequently, at most once a week. We don't need real-time response. Even a delay of a few hours is still acceptable.
My Current Thoughts of Solution
Possible solution #1: In-house web service. I can set up a server machine and develop our own web service. We put the report into our database and the clients can then query via the Internet.
Possible solution #2: Amazon Web Service. AWS is quite mature but I'm not sure if they could be expensive because so far we just wanna share a report with our remote clients which doesn't look like a big deal to use AWS.
Possible solution #3: Google Drive. I know Google Drive provides API to do uploading and sharing programmatically, but I think we need to register a dedicated Google account to use that.
Any better solutions??
You could possibly use AWS S3 and Cloudfront. Files can easily be loaded into S3 using the AWS SDK's and API. You can then use the API to generate secure links to the files that can only be opened for a specific time and optionally from a specific IP.
Files on S3 can also be automatically cleaned up after a specific time if needed using lifecycle rules.
Storage and transfer prices are fairly cheap with AWS and remember that the S3 storage cost indicated is by the month so if you only have an object loaded for a few days then you only pay for a few days.
S3: http://aws.amazon.com/s3/pricing
Cloudfront: https://aws.amazon.com/cloudfront/pricing/
Here's a list of the SDK's for AWS:
https://aws.amazon.com/tools/#sdk
Or you can use their command line tools for Windows batch or powershell scripting:
https://aws.amazon.com/tools/#cli
Here's some info on how the private content urls are created:
http://docs.aws.amazon.com/AmazonCloudFront/latest/DeveloperGuide/PrivateContent.html
I will suggest to built this service using mix of your #1 and #2 options. You can do the processing and for transferring the data leverage AWS S3 which is quiet cheap.
Example: 100GB costs like approx $3.
Also AWS S3 will be beneficial as you are covered for any disaster on your local environment your data will be safe in S3.
For security you can leverage data encryption and signed URLS in AWS S3.

Google Cloud Platform - Data Distribution

I am trying to figure out a proper solution for the following:
We have a client from which we want to receive data, for instance a binary that is 200Mbytes updated daily. We want them to deposit that data file(s) onto a local server near them (Europe).
We then want to do one of the following:
We want to retrieve the data, either from a local
server where we are (China/HK), or
We can log into their European
server where they have deposited the files and pull the files directly ourselves.
QUESTIONS:
Can Google's clould platform serve as a secure, easy way to provide a cloud drive for which to store and pull the data file?
Does Google's cloud platform distribute such that files pushed onto a server in Europe will be mirrored in a server over in East Asia? (that is, where and how would this distribution model work with regard to my example.)
For storing binary data, Google Cloud Storage is a fine solution. To answer your questions:
Secure: yes. Easy: yes, in that you don't need to write different code depending on your location, but there is a caveat on performance.
Google Cloud Storage replicates files for durability and availability, but it doesn't mirror files across all bucket locations. So for the best performance, you should store the data in a bucket located where you will access it the most frequently. For example, if you create the bucket and choose its location to be Europe, transfers to your European server will be fast but transfers to your HK server will be slow. See the Google Cloud Storage bucket locations documentation for details.
If you need frequent access from both locations, you could create one bucket in each location and keep them in sync with a tool like gsutil rsync

Store files on disk or MongoDB

I am creating a mongodb/nodejs blogging system (similar to wordpress).
I currently have the images being saved on the disk and a pointer being placed in mongo. I was wondering since I have all sessions being stored in MongoDB to enable easy load balancing across servers, would storing the actual files in Mongo also be a smart idea for easy multiserver setups and/or performance gains.
If everything is stored in a DB, you can simply spawn more web servers and/or mongo replications to scale horizontally
Opinions?
MongoDB is a good option to store your files (I'm talking about GridFS), specially for the use case you described above.
When you store files into MongoDB (GridFS, not documents), you get all the replication and sharding capability for free, which is awesome.
If you have to spawn a new server and you have the files already into MongoDB, all you have to do is to enable replication (thus scale horizontally). I'm sure this can save you a lot of headaches.
Resources:
Is GridFS fast and reliable enough for production?
http://www.mongodb.org/display/DOCS/GridFS
http://www.coffeepowered.net/2010/02/17/serving-files-out-of-gridfs/
Aside from GridFS, you might be considering a cloud-based deployment. In that case, you might consider storing files in cloud-specific storage (Windows Azure has Blob Storage, for example). Sticking with Windows Azure for this example (since that's what I work with), you'd reference a file by its storage account URI. For example:
https://mystorageacct.blob.core.windows.net/mycontainer/myvideo.wmv
Since you'd be storing the MongoDB database itself in its own blob (and mounted as disk volume on your Linux or Windows VM), you could then choose to store your files in either the same storage account or a completely different storage account (with each storage account providing 100TB 200TB of storage).
Storing the image as document in mongodb would be a bad idea, as the resources which could have been used to send a large amount of informational data would be used for sending files.
Have a look at mongoDb file storage GridFS , that might solve your problem
of storing images, and providing horizontal scalability as well.
http://www.mongodb.org/display/DOCS/GridFS
http://www.mongodb.org/display/DOCS/GridFS+Specification