We are in the process of building a cluster for our hosted services at work, the final product will be used to host multiple separate services. We are in the middle of deciding on how we want to setup our databases. We are running a postgresql database server which all services in the cluster will use. The debate right now is whether to give each service its own schema in a single database or to give each service its own database.
We just aren't sure which is the better solution for us. None of our services have a common structure and data does not need to be shared. What we are more concerned about is ease of use.
Here's what we care most about, we are really hoping for an objective vs opinion based answer.
Backups
Disaster recovery - all services vs individual
Security between services
Performance
For some additional information, the cluster is hosted within AWS with our database being an RDS instance.
This is what PostgreSQL official docs says:
Databases are physically separated and access control is managed at the connection level. If one PostgreSQL server instance is to house projects or users that should be separate and for the most part unaware of each other, it is therefore recommendable to put them into separate databases. If the projects or users are interrelated and should be able to use each other's resources they should be put in the same database, but possibly into separate schemas. Schemas are a purely logical structure and who can access what is managed by the privilege system.
Source: http://www.postgresql.org/docs/8.0/static/managing-databases.html
Disaster recovery - all services vs individual
You can dump and restore one database at a time. You can dump and restore one schema at a time. You can also dump schemas that match a pattern.
Security between services
I presume you mean isolation between databases and isolation between schemas. The isolation between databases is stronger and more "natural" for developers concerned with "ease of use". For example, if you use one database per service, every developer can just use the public schema for all development. This might seem "easier" than adding schemas to the search path, or "easier" than using schema.object when programming.
It depends in part on how you manage privileges for the roles you use for development, and on how you manage privileges in each database or schema. You can change default privileges.
Performance
I don't see a measurable difference. YMMV.
Related
I am creating accounting/invoicing software and my database is in postgreSQL. Should I create a separate database for each user since the data is sensitive financial data? Or is having a user foreign key secure enough? If I am hosting the database on aws I understand that I could have a few db servers across multiple availability zones and regions so that if one is compromised it wouldn't effect everyone even if many users have info stored in a single database. Is this safe enough? Thanks!
In general no. Encrypt the data so that if someone exfiltrates a dump they can't actually use it without the decryption key. If you're worried that someone with admin access can see the user's information then you might want to consider a user-level encryption for all fields related to personally identifiable information.
There are few ways you could go about it but I wouldn’t create a new DB for every customers. It will be too expensive and a pain to maintain and evolve.
To me, this sounds like you are creating a multi-tenant application.
I’d personally use the row-level security feature in Postgres (see this article) or create a separate Schema for each Customer.
You can add an extra layer of protection with encryption at rest. AWS support it (link)
My question is kind of similar to this question, but not quite :
Hide a marklogic database to specific user (permissions)
Background - up until now, developers who use database X were all admins on the server ( this is a historic config that we have recently inherited ), but now we want to have new developers added to the server who definitely wont be admins, and who will have a new database Y added to the server.
What we want to do is have several groups of developers using the same MarkLogic 10 server, but have it so developer group X can only work in their database X, and Developer group Y can only work in database Y. We dont care if they can see all databases on the server.
Does this mean we have to apply permissions to every document in every database to do this, or can we control this via a roles that limit access to specific databases?
Can someone suggest the right way to achieve this please?
Thanks in advance.
You have two tools to work with:
Granular privileges which allow you limit the scope of a privilege to a specific resource (such as database or forest)
Document permissions unique to documents reflective of their respective set of intended users on each database as you already mentioned
However, in my experience, I've generally found this use case is better served by having many small dev clusters rather than one large one as resource contention (one app team pushing CPU to 100%) can become too much of an issue. It is pretty quick and painless to spin up and tear down dev clusters on AWS or Azure. Or, if you're self-hosting, you could look at running multiple MarkLogic Containers on a single host.
We would like to use OrientDB Graph in an Azure environment. Does anybody has experience using it? We also would like to know if high availability from OrientDB is required under Azure cloud? Azure already offers high availability for Azure storage, Azure Drive and SQL. I understand that they have replications and load balancing built in.
This is super important because we prefer not to get into the business of replications and infrastructure management.
Thanks
So you can spin up 2 or more machines and install OrientDB on them, then configure them together as a distributed cluster. However I haven't been able to find any way that is simpler, easier to do. I am interested in this topic too.
Azure does have features such as geo-replication, which is protects your data against a major data-center incident but doesn't provide any performance benefit and will not make it highly available.
Although pretty reliable, occasionally Microsoft will reboot servers for updates, so to protect against downtime you can use affinity groups so that, of your 2 or more servers, one will always be online. This however does need to be used in conjunction with database replication and ideally load balancing.
It's also worth noting that OrientDB recommends clusters have an odd number of servers as this can prevent conflicts when synchronising data after a communication issue between the servers.
I am using it in amazon and I had to create a java project to monitor http requests inserts and queries. The queries are very fast but takes longer inserting data .
I recommend this type of graph database mode to decrease the time of the queries. Also if you have empty fields OrientDB manages very well compared to other databases .
If you need help with the java project can response to this post and I´ll help u.
I hope it helps. Good luck.
This is from another question bust i think it should be answered by the meteor team because i can't find a straight answer so far.
"..We have decided to use MongoDB for a SaaS offering we are creating. Each company that signs up gets their own url (mycompany.domain.com) and their own private set of users, projects, etc... Since we are using a NoSQL solution, and wouldn't have to manage pushing out schema updates to every database like we would with MySQL, I am wondering if it would be better to have one huge database containing all the data, or to have one database per client..."
So, can i have with meteor aproach (with one meteor project/server):
1) Different Url for each company
2) Different database (in the same monodb server) for each company and for that specific company users.
If you look at meteor's own hosting they use a mongodb server from MongoHQ. You could use multiple meteor servers with the single mongodb server and multiple databases.
I would think it depends more on your apps design, Meteor can use either design.
1) You could use the publish functions to provide each client with only his/her own records from one huge DB, use a way to get the subdomain http host into the publish function so it only gives out data for that set.
2) Use seperate meteor instances connecting to their own mongodb database on one server, and use some kind of proxy to server them to the subdomains. You could push each one with whatever data you would like, even perhaps separate app sets.
It would really depend on what you're building. If you want to only have to update one set of data so it updates for everyone you could go with 1), so if your use case requires this it might be a better option to go with.
The benefit of using seperate meteor instances is primarily customization. Its really hard to get the gist of what you want with the details you've given, so ill cut it short: If you want the ability of each client to be very different use 2), otherwise use 1)
If you look at Meteor.com's hosting I think each deployment is given its own database, the main reason: customization, everyones deployment is likely to be completely different.
UPDATE:
As of March 2014, there is a third party atmosphere package meteor-dbproxy that allows you to use multiple mongodb servers (as well as separate oplog integration endpoints) in your backend, thus allowing you db-level sandboxed multi-tenancy.
From a MongoDB point of view, you can do a database per client. The current stable MongoDB version, 2.2, has database level locking opposed to the large global lock of previous versions.
This way, if one of your clients is hammering the system, they don't affect your other clients with a global lock.
I want to use a NoSQL database on Windows Azure and the data volume will be very large. Whether a Azure Table storage or a MongoDB database running using a Worker role can offer better performance and scalability? Has anyone used MongoDB on Azure using a Worker role? Please share your thoughts on using MongoDB on Azure over the Azure table storage.
Table Storage is a core Windows Azure storage feature, designed to be scalable (100TB 200TB 500TB per account), durable (triple-replicated in the data center, optionally georeplicated to another data center), and schemaless (each row may contain any properties you want). A row is located by partition key + row key, providing very fast lookup. All Table Storage access is via a well-defined REST API usable through any language (with SDKs, built on top of the REST APIs, already in place for .NET, PHP, Java, Python & Ruby).
MongoDB is a document-oriented database. To run it in Azure, you need to install MongoDB onto a web/worker roles or Virtual Machine, point it to a cloud drive (thereby providing a drive letter) or attached disk (for Windows/Linux Virtual Machines), optionally turn on journaling (which I'd recommend), and optionally define an external endpoint for your use (or access it via virtual network). The Cloud Drive / attached disk, by the way, is actually stored in an Azure Blob, giving you the same durability and georeplication as Azure Tables.
When comparing the two, remember that Table Storage is Storage-as-a-Service: you simply access a well-known REST endpoint. With MongoDB, you're responsible for maintaining the database (e.g. whenever MongoDB Inc (formerly 10gen) pushes out a new version of MongoDB, you'll need to update your server accordingly).
Regarding MongoDB Inc's alpha version pointed to by jtoberon: If you take a close look at it, you'll see a few key things:
The setup is for a Standalone mongodb instance, without replica-sets or shards. Regarding replica-sets, you still get several benefits using the Standalone version, due to the way Blob storage works.
To provide high-availability, you can run with multiple instances. In this case, only one instance serves the database, and one is a 'warm-standby' that launches the mongod process as soon as the other instance fails (for maintenance reboot, hardware failure, etc.).
While 10gen's Windows Azure wrapper is still considered 'alpha,' mongod.exe is not. You can launch the mongod exe just like you'd launch any other Windows exe. It's just the management code around the launching, and that's what the alpa implementation is demonstrating.
EDIT 2011-12-8: This is no longer in an alpha state. You can download the latest MongoDB+Windows Azure project here, which provides replica-set support.
For performance, I think you'll need to do some benchmarking. Having said that, consider the following:
When accessing either Table Storage or MongoDB from, say, a Web Role, you're still reaching out to the Windows Azure Storage system.
MongoDB uses lots of memory for its own cache. For this reason, lots of high-scale MongoDB systems are deployed to larger instance sizes. For Table Storage access, you won't have the same memory-size consideration.
EDIT April 7, 2015
If you want to use a document-based database as-a-service, Azure now offers DocumentDB.
I have used both.
Azure Tables : dead simple, fast, really hard to write even simple queries.
Mongo : runs nicely, lots of querying capabilities, requires several instances to be reliable.
In a nutshell,
if your queries are really simple (key->value), you must run a cost comparison (mainly number of transactions against the storage versus cost of hosting Mongo on Azure). I would rather go to table storage for that one.
If you need more elaborate queries and don't want to go to SQL Azure, Mongo is likely your best bet.
I realize that this question is dated. I'd like to add the following info for those who may come upon this question in their searches.
Note that now, MongoDB is offered as a fully managed service on Azure. (officially in Beta as of Apr '15)
See:
http://www.mongodb.com/partners/cloud/microsoft
or
https://azure.microsoft.com/en-us/blog/announcing-new-mongodb-instances-on-microsoft-azure/
See (including pricing):
https://azure.microsoft.com/en-us/marketplace/partners/mongolab/mongolab/
My first choice is AzureTables because SAAS model and low cost and SLA 99.99% http://alexandrebrisebois.wordpress.com/2013/07/09/what-if-20000-windows-azure-storage-transactions-per-second-isnt-enough/
some limits..
http://msdn.microsoft.com/en-us/library/windowsazure/jj553018.aspx
http://www.windowsazure.com/en-us/pricing/calculator/?scenario=data-management
or AzureSQL for small business
DocumentDB
http://azure.microsoft.com/en-us/documentation/services/documentdb/
http://azure.microsoft.com/en-us/documentation/articles/documentdb-limits/
second choice is many cloud providers including Amazon offer S3
or Google tables https://developers.google.com/bigquery/pricing
nTH choice manage the SHOW all by myself have no sleep MongoDB well I will look again the first two SAAS
My choice if I am running "CLOUD" I will go for SAAS model as much as possible "RENT-IT"...
The question is what my app needs is it AzureTables or DocumentDB or AzureSQL
DocumentDB documentation
http://azure.microsoft.com/en-us/documentation/services/documentdb/
How Azure pricing works
http://azure.microsoft.com/en-us/pricing/details/documentdb/
this is fun
http://www.documentdb.com/sql/demo
At Build 2016 it was announced that DocumentDB would support all MongoDB drivers. This solves some of the lack of tooling issues with DocDB and also makes it easier to migrate Mongo apps.
Above answers are all good - but the real answer depends on what your requirements are. You need to understand what size of data you are processing, what types of operations you want to perform on the data and then select the solution that meets your needs.
One thing to remember is Azure Table Storage doesn't support complex data types.It supports every property in entity to be a String or number or boolean or date etc.
One can't store an object against a key,which i feel is must for NoSql DB.
https://learn.microsoft.com/en-us/rest/api/storageservices/fileservices/understanding-the-table-service-data-model scroll to Property Types