Need advice: How to share a potentially large report to remote users? - email

I am asking for advice on possibly better solutions for the part of the project I'm working on. I'll first give some background and then my current thoughts.
Background
Our clients can use my company's products to generate potentially large data sets for use in their industry. When the data sets are generated, the clients will file a processing request to us.
We want to send the clients a summary email which contains some statistical charts as well as sampling points from the data sets so they can do some initial quality control work. If the data sets are of bad quality, they don't need to file any request.
One problem is that the charts and sampling points can be potentially too large to be sent in an email. The charts and the sampling points we want to include in the emails are pictures. Although we can use low-quality format such as JPEG to save space, we cannot control how many data sets would be included in the summary email, so the total size could still exceed the normal email size limit.
In terms of technologies, we are mainly developing in Python on Ubuntu 14.04.
Goals of the Solution
In general, we want to present a report-like thing to the clients to do some initial QA. The report may contains external links but does not need to be very interactive. In other words, a static report should be fine.
We want to reduce the steps or things that our clients must do to read the report. For example, if the report can be just an email, the user only needs to 1). log in and 2). open the email. If they use a client software, they may skip 1). and just open and begin to read.
We also want to minimize the burden of maintaining extra user accounts for both us and our clients. For example, if the solution requires us to register a new user account, this solution is, although still acceptable, not ranked very high.
Security is important because our clients don't want their reports to be read by unauthorized third parties.
We want the process automated. We want the solution to provide programming interface so that we can automate the report sending/sharing process.
Performance is NOT a critical issue. Our user base is not large. I think at most in hundreds. They also don't generate data that frequently, at most once a week. We don't need real-time response. Even a delay of a few hours is still acceptable.
My Current Thoughts of Solution
Possible solution #1: In-house web service. I can set up a server machine and develop our own web service. We put the report into our database and the clients can then query via the Internet.
Possible solution #2: Amazon Web Service. AWS is quite mature but I'm not sure if they could be expensive because so far we just wanna share a report with our remote clients which doesn't look like a big deal to use AWS.
Possible solution #3: Google Drive. I know Google Drive provides API to do uploading and sharing programmatically, but I think we need to register a dedicated Google account to use that.
Any better solutions??

You could possibly use AWS S3 and Cloudfront. Files can easily be loaded into S3 using the AWS SDK's and API. You can then use the API to generate secure links to the files that can only be opened for a specific time and optionally from a specific IP.
Files on S3 can also be automatically cleaned up after a specific time if needed using lifecycle rules.
Storage and transfer prices are fairly cheap with AWS and remember that the S3 storage cost indicated is by the month so if you only have an object loaded for a few days then you only pay for a few days.
S3: http://aws.amazon.com/s3/pricing
Cloudfront: https://aws.amazon.com/cloudfront/pricing/
Here's a list of the SDK's for AWS:
https://aws.amazon.com/tools/#sdk
Or you can use their command line tools for Windows batch or powershell scripting:
https://aws.amazon.com/tools/#cli
Here's some info on how the private content urls are created:
http://docs.aws.amazon.com/AmazonCloudFront/latest/DeveloperGuide/PrivateContent.html

I will suggest to built this service using mix of your #1 and #2 options. You can do the processing and for transferring the data leverage AWS S3 which is quiet cheap.
Example: 100GB costs like approx $3.
Also AWS S3 will be beneficial as you are covered for any disaster on your local environment your data will be safe in S3.
For security you can leverage data encryption and signed URLS in AWS S3.

Related

Dropbox app with tiered users

Preface:
I'm hoping to upgrade an existing application by adding cloud backup and syncing of the customers data. We want this to be as seamless as possible, but also for the customers only interface to the data to be via the applications front-end interface.
Our application can be connected to the oil pipe of a machine, collects data on the oil condition. When a test has completed we want to push this to the cloud. Because of the distinct test nature of the data (as opposed to one big trend) most IoT platforms don't suit very well, so we're aiming to release a slightly modified version of the application which doesn't have the connection to the sensors and this will be our remote front-end.
Since the existing application uses a relatively simple file structure to store it's data, if we simply replicate these files in the cloud, the remote front-end version can just download these to the same location and it'll work fine. Thus this has lead us to Dropbox (or any recommended more appropriate cloud storage system).
We hope to use the Dropbox API directly in our application to push and pull the files as necessary. All of this so far we believe is perfectly achievable.
Question: Is it possible - and if so how would we go about - to setup a user system with the below requirements
The users personal dropbox is not used
Dropbox is completely hidden from the user
The application vendor has a top level user who has access to all data (for analytic, we do not want to store confidential or sensitive data).
When the user logs in they only have access to their folder and any attackers could not disrupt the overall structure. (We understand that if an attacker got the master account then all is lost, but that is an internal issue to keep it secure. As long as the user accounts are isolated this is okay.)
Alternative Question Is anyone aware of a storage system or IoT system which would better suite this use case? We will still require backups/loss prevention as part of the service.

While scaling up, how to make user uploaded files available accross multiple servers?

I have a website in which users would upload various and later access them.
The files are stored in a specific path in the server at this point. Now if I need to have multiple servers for the website, what is the best way to make the user uploaded files accessible across multiple servers. Amazon s3 is one option that has crossed my mind. What other options do I have?
First, you can try using a CDN (http://en.wikipedia.org/wiki/Content_delivery_network).
Also, you can make it in house, by having specialized servers setup for static content. You will need maybe a lookup server, to know for each file on what server can be found. It will also contain the logic to determine what is the best server to use to save the file. This is more complicated, as you will have to make the load balancing and take care of geographic location of users.

Using CouchDB as interface. Is it appropriate way?

our devices (microscopes with cameras) produce images and additional information to each image.
Now a middleware supplies wants to connect these devices to lab automation system. They have to acquire the data and we have to provide it. An astonishing thing for me was their interface suggestion - a very cryptical token separated format (ASTM E1394-97). Unfortunatelly, they even can't accomodate images in their protocol, and are aiming to get file-paths.
I thought it is not the up-to date approach. While lookink for alternatives, I saw CoachDB.
So, my idea was, our devices would import data including images in CoachDB and they could get the data. It seems even, that using mustache, we could produce the format they want (ascii-text) and placing URLs as image references instead of path's.
My question is, did someone applied CoachDB for such a use case already? It seems to be a little-bit misuse of CoachDB, as the main intention is interface not data storage. Another point disturbing me is, that the inventor of CoachDB went to other project Coachbase. Could it mean lack of support for CoachDB in the future?
Thank you very much for any insights and suggestions!
It's ok use-case and actually we're using CouchDB in such way - as proxing middleware between medical laboratory analyzers and LIS. Some of them publish images or pdf data on shared folders and we'd just loading them into related document as attachments.
More over you'd like to know, CouchDB is able to serve external processes (aka os_daemons) and take care about their lifespan: restarting if someone had terminated and starting right after you update config options through HTTP interface. This helps to setup ASTM client and server processes since this protocol is different from HTTP (which is native for CouchDB) which communicates with devices and creates documents as regular CouchDB clients. In same way you may setup daemons to monitor shared folders for specific files. And all this is just CouchDB with few "low bounded" plugins.

Avoiding data loss: suggested reading

I am about to work on an app which handles extremely valuable data. Any loss of this data for the user would be very costly, so I'm interested in finding out more about the best architecture design for our needs.
The user will be inputting this data in their iPhone each day. The alternative to using this app is carrying around a piece of paper with this sensitive information on it. So while I know we can be more secure than a piece of paper, I want to make sure we also cover the user stories like "I flushed my phone down the toilet" or "my son deleted the app, where's my data?"
A service like Dropbox comes to mind, but I wouldn't want to require our users to have a Dropbox account; the syncing architecture must be transparent to the user. iCloud is out because web and Android versions may follow.
Can anyone suggest either some good reading on this subject, or some good frameworks to look at? I expect to use a node.js backend, and while we are targeting iPhone first, Android will follow.
The data itself consists of 2 tables, each with a small number of fields, with a many to many relationship. A few new rows will be created by the user each day, but the data will be small and highly compressible.
Turns out this is an extremely difficult issue. In data assurance (this isnt yet a security type situation although could become one because of the assurance aspect) there is ALWAYS a time element. As a simple example what happens if your use has locally updated some piece of data. Just before you have the ability to fully push the data to some cloud service, etc... he / she dumps it in the toilet. Even if good signal was there for transmitting the data there is time in transferring and time necessary for the cloud server to respond saying the data got there properly.
Generally in data assurance, you really have to work to the best you can. You will NEVER be able to solve all issues as there is no data center, nor link to a data center, etc... that is perfect. There is always a chance of data loss. Truly the best you can do, is SYNC as fast as data changes, and if there is loss of connection, as soon as the connection becomes alive again.
Now, for security. Security by itself does not create assurance. If the data itself is something that the customer does not want to lose, and that is his only requirement, then security is un-necessary. If he / she is also worried about other getting their hands on his data, then you have to be worried about data-in-transit (both up and down during syncing), and on the device itself. For the best potential security, encrypt the data locally on the device prior to pushing over the cloud. There are many known attacks that even if using SSL or other services, can get at the data. If you wish, locally encrypt a file, then you could for SOME added security still use SSL (at this point you will have doubly encrypted the data). You also want to sign the data so that there is little chance of it being manipulated in transit, or by the cloud server itself (if a hacker hacked the cloud server). Generally the way to protect the data while on device, you may choose to have the user input a password, and put some fairly strict rules around how passwords are formed, and how many tries you allow before you disallow attempts for 30 minutes or so.
You may also wish to store the data locally in an encrypted form. This way if someone gets the device, they still will need to have the password before they can get the data (unless of course they can crack the algorithm you use to generate the symetric key from the password).
In terms of online data service, you could use iCloud, etc... I am actually NOT a fan of anything cloud. I think it is SO counter enterprise / proprietary data, it isnt even funny. I think it actually almost laughable that so many of these phone / device manufacturers are going SOOOOO cloud based. I think they are abandoning the big companies, as NO big company I know of wants to place their proprietary data on a cloud server that THEY DONT CONTROL. In any case, I would argue that so long as you have a good local encryption scheme prior to sending out the data, then you should be OK. I would from an assurance perspective however look at where the servers are in locale. the reason being that if assurance of data is of prime concern, most larger IT setups like to have replicated data centers on opposing sides of the country / world etc... The reason for this is if an earthquake takes down the data center on one side of the country, it most likely will NOT take down the one on the other side of the country simultaneously. If the data centers for iCloud or whatever you can find are essentially in one locale, then you may consider syncing with one data center on the west coast, and choose a completely differing data center (in this case company) to sync with that is centered on the east coast.
This is all very high level, how you would implement this on an iPhone specifically we could also talk about, byt I hope this at least begins to help pave a path.

Amazon S3 + CloudFront Queries

I am currently making a social sharing like app and I encounter a problem.
First off, S3 in my experience is slow, so I need to sync the data for multiple servers around the world to make it faster for multiple users.
So my question is, I need to create multiple buckets for each country right? Amazon has a list of their server locations. So for each user, I calculate the nearest server than upload there? How?
Next question, in my app people can subscribe to others and check for their updates. So realistically, this would not create a speed difference. If someone in Singapore uploaded a piece of text and has a subscriber in United States, it wouldn't be any quicker for this subscriber because he has to download a piece of text stored all the way in the Singapore.
All of this is making me confused! I personally find S3 very slow, which is why I am using CloudFront.
Any help? Am I misunderstanding the process? Thanks!
Buckets are not per country, they are per region (EU, US, Asia, etc.)
Secondly, you do not have to manage closest URL to your S3 buckets, that's what CloudFront is for, you just get a single URL for each bucket and CloudFront will manage routing the user's request to the closest edge location.
PS: In addition, Amazon replicates data uploaded to your bucket across all edge locations transparently.
Amazon in no way "automatically" replicates your content out to the edge locations. Instead, your content is copied to a single edge location, if (and only) if the content is not there (could be the first pull, could be it's expired) when a user tries to access it from that edge. It is a pull mechanism, not a push. See "Download Distributions for HTTP Delivery" section of http://aws.amazon.com/cloudfront/