I am creating accounting/invoicing software and my database is in postgreSQL. Should I create a separate database for each user since the data is sensitive financial data? Or is having a user foreign key secure enough? If I am hosting the database on aws I understand that I could have a few db servers across multiple availability zones and regions so that if one is compromised it wouldn't effect everyone even if many users have info stored in a single database. Is this safe enough? Thanks!
In general no. Encrypt the data so that if someone exfiltrates a dump they can't actually use it without the decryption key. If you're worried that someone with admin access can see the user's information then you might want to consider a user-level encryption for all fields related to personally identifiable information.
There are few ways you could go about it but I wouldn’t create a new DB for every customers. It will be too expensive and a pain to maintain and evolve.
To me, this sounds like you are creating a multi-tenant application.
I’d personally use the row-level security feature in Postgres (see this article) or create a separate Schema for each Customer.
You can add an extra layer of protection with encryption at rest. AWS support it (link)
Prototyping a project with Mongo & Spring Boot and thinking it does a lot of what I want. However, I really need to have encrypted data-at-rest, which would seem to indicate I have to purchase the enterprise version. Since I don't have a budget yet, I am wondering if there is another alternative that people have found useful? I think DynamoDB can be used in a local & test environment. Or it viable to encrypt the data at the application level and still have great performance for my CRUD operations?
I've done application level encryption with DynamoDB before with some success. My issues where not really with DynamoDB but with the encryption in the application.
First, encryption/decryption is very expensive. I had to increase the number of servers I was using by over double just to handle the extra CPU load. Your milage may very. In my case, I was using Node.js and the servers suddenly switched from being I/O bound to being CPU bound.
Second, doing encryption/decryption application side adds a lot of complexity to your app. You will almost certainly need to parallelize the encryption/decryption to minimize the added latency that it will cause. Also, you will need to figure out a secure way of sharing the keys.
Last, application level encryption will make some DynamoDB operations unavailable to you. For example, conditions probably won't make sense anymore for encrypted values.
Long story short, I wouldn't recommend application level encryption regardless of the database.
DynamoDB now supports what they call Server-Side Encryption at Rest. Personally I think that name is a little confusing but from their perspective, your application is the client and DynamoDB is the server.
Amazon DynamoDB encryption at rest helps you secure your application
data in Amazon DynamoDB tables further using AWS-managed encryption
keys stored in AWS Key Management Service (KMS). Encryption at rest is
fully transparent to the user with all DynamoDB queries working
seamlessly on encrypted data. With this new capability, it has never
been easier to use DynamoDB for security-sensitive applications with
strict encryption compliance and regulatory requirements.
Blog post about DynamoDB encryption at rest
You simply enable encryption when you create a new table and DynamoDB
takes care of the rest. Your data (tables, local secondary indexes,
and global secondary indexes) will be encrypted using AES-256 and a
service-default AWS Key Management Service (KMS) key. The encryption
adds no storage overhead and is completely transparent; you can
insert, query, scan, and delete items as before. The team did not
observe any changes in latency after enabling encryption and running
several different workloads on an encrypted DynamoDB table.
I'm looking for ways to make the Mongo storage used by Spark.Net HIPAA compliant. Does using MongoDB SSL Transport to encrypt data on wire and using Gazzang for data at rest good enough? Are there other options for data at rest while still allowing for indexing certain properties in JSON?
Given HIPAA compliance has a number of privacy and security requirements, I would assume you are also getting some professional advice on how to comply. There are physical and technical requirements that extend beyond the database software, but encryption of data in motion and at rest will tick some of the boxes.
I will add the disclaimer that "I am not a lawyer or a HIPAA expert", so you'll have to research/confirm the specific compliance details for your use case.
Encryption of data in motion
The default binary distributions of MongoDB currently do not include SSL support. You can either build from source with SSL support, or get a commercial license for MongoDB Enterprise.
You should ensure the SSL mode is set to requireSSL and enable certificate validation with x.509 certificates.
I believe you need to use a Federal Information Processing Standard (FIPS) compliant encryption algorithm. FIPS mode is currently only supported in MongoDB Enterprise.
Encryption of data at rest
MongoDB (as at 2.6) does not have built-in support for encryption of data at rest, however there are a number of third party partner solutions which currently include:
BitLocker Drive Encryption
Vormetric Data Security Platform
IBM Guardium Data Encryption
The above solutions can be used to transparently encrypt the data directories used by MongoDB, so you still have full access to query and indexing functionality.
It's likely that some of the data you store may have more stringent requirements (eg. around privacy or redaction of specific fields) so there may be some additional application logic to implement.
Related information
The MongoDB Security Architecture white paper goes into more detail on security & auditing options.
The Security section of the MongoDB manual includes some specifics on best practices and configuration.
The java web app I'm developing allows users to upload files (pictures and documents) to their profiles and define access rules for those files (define which of the other users are able to view / download the file). The access control / permission system is custom made and rules are stored in mongoDB alongside the user's profile and the actual file entry.
Knowing that I need the application and storage to be distributed and fault-tolerant I need to figure out which is the best strategy for file storage.
Should I store the files inside mongoDB in the files collection where the file document containing description and access rules are located ?
Or should I store the files inside the server's file system and keep the path in the mongoDB document? With the filesystem approach will I still be able to enforce the user defined access permissions and how?
Finally in the filesystem approach how do I distribute files accross servers? Should I use dedicated servers for this or can I store the files on the webapp servers or mongodb servers ?
Thanks a lot for all your insights! Any help or feedback appreciated.
Alex
There are several alternatives:
put files in a storage service (e.g. S3): easy and much space but bad perf
put files in a local filesystem: fast but doesnt scale
put files in mongodb docs: easy, powerful and scalable but limited to 16MB
use GridFS layer of mongodb. Functionalities are limited but it is made for scalability (thanks to sharding) and is fairly fast too. Note you can put info about file (permission etc) right into the file's metadata object.
In your case it sounds like last option may be best, there are quite a few users who switched from FS to gridFS and it worked very well for them.
Things to keep in mind:
gridfs sharding works but is not perfect: usually only data is sharded, not the metadata. Not a big deal but the shard with metadata must be very safe.
it can be beneficial to use gridfs in a separate mongodb cluster from your core data, since requirements (storage, backup, etc) are usually different.
I want to use a NoSQL database on Windows Azure and the data volume will be very large. Whether a Azure Table storage or a MongoDB database running using a Worker role can offer better performance and scalability? Has anyone used MongoDB on Azure using a Worker role? Please share your thoughts on using MongoDB on Azure over the Azure table storage.
Table Storage is a core Windows Azure storage feature, designed to be scalable (100TB 200TB 500TB per account), durable (triple-replicated in the data center, optionally georeplicated to another data center), and schemaless (each row may contain any properties you want). A row is located by partition key + row key, providing very fast lookup. All Table Storage access is via a well-defined REST API usable through any language (with SDKs, built on top of the REST APIs, already in place for .NET, PHP, Java, Python & Ruby).
MongoDB is a document-oriented database. To run it in Azure, you need to install MongoDB onto a web/worker roles or Virtual Machine, point it to a cloud drive (thereby providing a drive letter) or attached disk (for Windows/Linux Virtual Machines), optionally turn on journaling (which I'd recommend), and optionally define an external endpoint for your use (or access it via virtual network). The Cloud Drive / attached disk, by the way, is actually stored in an Azure Blob, giving you the same durability and georeplication as Azure Tables.
When comparing the two, remember that Table Storage is Storage-as-a-Service: you simply access a well-known REST endpoint. With MongoDB, you're responsible for maintaining the database (e.g. whenever MongoDB Inc (formerly 10gen) pushes out a new version of MongoDB, you'll need to update your server accordingly).
Regarding MongoDB Inc's alpha version pointed to by jtoberon: If you take a close look at it, you'll see a few key things:
The setup is for a Standalone mongodb instance, without replica-sets or shards. Regarding replica-sets, you still get several benefits using the Standalone version, due to the way Blob storage works.
To provide high-availability, you can run with multiple instances. In this case, only one instance serves the database, and one is a 'warm-standby' that launches the mongod process as soon as the other instance fails (for maintenance reboot, hardware failure, etc.).
While 10gen's Windows Azure wrapper is still considered 'alpha,' mongod.exe is not. You can launch the mongod exe just like you'd launch any other Windows exe. It's just the management code around the launching, and that's what the alpa implementation is demonstrating.
EDIT 2011-12-8: This is no longer in an alpha state. You can download the latest MongoDB+Windows Azure project here, which provides replica-set support.
For performance, I think you'll need to do some benchmarking. Having said that, consider the following:
When accessing either Table Storage or MongoDB from, say, a Web Role, you're still reaching out to the Windows Azure Storage system.
MongoDB uses lots of memory for its own cache. For this reason, lots of high-scale MongoDB systems are deployed to larger instance sizes. For Table Storage access, you won't have the same memory-size consideration.
EDIT April 7, 2015
If you want to use a document-based database as-a-service, Azure now offers DocumentDB.
I have used both.
Azure Tables : dead simple, fast, really hard to write even simple queries.
Mongo : runs nicely, lots of querying capabilities, requires several instances to be reliable.
In a nutshell,
if your queries are really simple (key->value), you must run a cost comparison (mainly number of transactions against the storage versus cost of hosting Mongo on Azure). I would rather go to table storage for that one.
If you need more elaborate queries and don't want to go to SQL Azure, Mongo is likely your best bet.
I realize that this question is dated. I'd like to add the following info for those who may come upon this question in their searches.
Note that now, MongoDB is offered as a fully managed service on Azure. (officially in Beta as of Apr '15)
See:
http://www.mongodb.com/partners/cloud/microsoft
or
https://azure.microsoft.com/en-us/blog/announcing-new-mongodb-instances-on-microsoft-azure/
See (including pricing):
https://azure.microsoft.com/en-us/marketplace/partners/mongolab/mongolab/
My first choice is AzureTables because SAAS model and low cost and SLA 99.99% http://alexandrebrisebois.wordpress.com/2013/07/09/what-if-20000-windows-azure-storage-transactions-per-second-isnt-enough/
some limits..
http://msdn.microsoft.com/en-us/library/windowsazure/jj553018.aspx
http://www.windowsazure.com/en-us/pricing/calculator/?scenario=data-management
or AzureSQL for small business
DocumentDB
http://azure.microsoft.com/en-us/documentation/services/documentdb/
http://azure.microsoft.com/en-us/documentation/articles/documentdb-limits/
second choice is many cloud providers including Amazon offer S3
or Google tables https://developers.google.com/bigquery/pricing
nTH choice manage the SHOW all by myself have no sleep MongoDB well I will look again the first two SAAS
My choice if I am running "CLOUD" I will go for SAAS model as much as possible "RENT-IT"...
The question is what my app needs is it AzureTables or DocumentDB or AzureSQL
DocumentDB documentation
http://azure.microsoft.com/en-us/documentation/services/documentdb/
How Azure pricing works
http://azure.microsoft.com/en-us/pricing/details/documentdb/
this is fun
http://www.documentdb.com/sql/demo
At Build 2016 it was announced that DocumentDB would support all MongoDB drivers. This solves some of the lack of tooling issues with DocDB and also makes it easier to migrate Mongo apps.
Above answers are all good - but the real answer depends on what your requirements are. You need to understand what size of data you are processing, what types of operations you want to perform on the data and then select the solution that meets your needs.
One thing to remember is Azure Table Storage doesn't support complex data types.It supports every property in entity to be a String or number or boolean or date etc.
One can't store an object against a key,which i feel is must for NoSql DB.
https://learn.microsoft.com/en-us/rest/api/storageservices/fileservices/understanding-the-table-service-data-model scroll to Property Types