Although this question fancies PostgreSQL, it is still a general DB question.
I have always been curious about the term schema as it relates to databases. Recently, we switched over to using PostgreSQL, where that term has actual significance to the underlying database structure.
In PostgreSQL-land, the decentralized structure is as follows:
DB Server (`some-server.com:5432`)
>> Database (`fizz`)
>> Schema (`buzz`)
>> Table (`foo`)
Thus, the FQDN for table [foo] is fizz.buzz.foo.
I understand that a "database" is a logical grouping of tables. For instance, an organization might have a "domain" database where all POJOs/VOs are persisted, an "orders" database where all sales-related info is stored, and a "logging" databases where all log messages get sent for future analysis, etc.
The introduction of this "schema" construct in between the database and its tables has me very confused, and the PostgreSQL documentation is a little too heavy-handed (and lacking good examples) for a newbie such as myself to understand.
I'm wondering if anyone can give me a laymen's description of not only what this "schema" construct is within the realm of PostgreSQL (and how it relates databases to tables), but I'm wondering what it means to database structures in general.
Thanks in advance!
Think of schemas as namespaces. We can use them to logically group tables (such as a People schema). Additionally, we can assign security to that schema so we can allow certain folks to look at a Customer schema, but not an Employee schema. This allows us to have a granularity of control of security just above an object level but below the database level.
Security is probably the most important reason to use schemas, but I've seen them used for logical groupings as well. It just depends on what you need them for.
Late to the party, but ..
I use schemas to split tables in to groups that are used by different applications that share a few tables, for example.
users
application1
application2
Here, if we log in with app1, we see users + application1; if we log in to app2, we see users and application2. So our user data can be shared between both, without exposing app1 users to app2 data. It also means that a superuser can do queries across both sets of data.
Related
I am initiating a new project which will be available as a SaaS for multiple customers. So, I am thinking of creating a database and then create individual schema for every customer.
I have defined some rules and the first rule is all the customers must always have the same schema. No matter what. If one customer gets an update, all the other customers will get the update as well.
For this purpose, my question is, is it possible to inherit schema from another schema in the same database? If not, do I have to manually create all the tables and indexes in the new schema and inherit them from the tables in master schema?
I am using Postgresql 9.6 but I can upgrade it as well if needed.
I open to suggestions.
Thanks in advance
There is no automated way to establish inheritance between all tables in two schemas, you'd have to do it one by one (a function can help).
However, I invite you to stop and think about your data model for a bit. How many users do you expect? If there could be many, plan differently, because databases with thousands of schemas become unwieldy (e.g. catalog lookups will become slow).
You might be better off with one schema for all users. If you are concerned with separation of the data and security, row level security might be the solution for you.
Relational Databases are able to set permissions for users to insert, update, delete, etc by schema or table (e.g. I can allow bob CRUD access to table someschema.XYZ but only allow read access to someschema.FooBar and no access to schema ABC)
Graph databases do not have predefined schemas but have an arbitrary set of node types. Is it possible to set restrictions on a graph database for what a user can access like you do for relational databases or does this granularity not exist in graph databases due to it's nature?
I am specifically looking at Neo4j but if this exists in other examples, then I would like to know.
Neo4j allows you to implement your own SecurityRules. A SecurityRule acts similar to a servlet filter, every request is evaluated with the SecurityRule.
However you have to implement the logic on your own which gives great flexibility but might also cause a serious amount of work.
We have a SaaS application where each tenant has its own database in Postgres. How would I apply a patch to all the databses? For example if I want to add a table or add a column to a table, I have to either write a program that loops through all databases and execute a SQL against them or using pgadmin, go through them one by one.
Is there smarter and/or faster way?
Any help is greatly appreciated.
Yes, there's a smarter way.
Don't create a new database for each tenant. If everything is in one database then you only need to alter one database.
Pick one database, alter each table to have the column TENANT and add this to the primary key. Then insert into this database every record for all tenants and drop the other databases (obviously considerably more work than this as your application will need to be changed).
The differences with your approach are extensively discussed elsewhere:
What problems will I get creating a database per customer?
What are the advantages of using a single database for EACH client?
Multiple schemas versus enormous tables
Practicality of multiple databases per client vs one database
Multi-tenancy - single database vs multiple database
If you don't put everything in one database then I'm afraid you have to alter them all individually, and doing it programatically would be simplest.
At a higher level, all multi-tenant applications follow one of three approaches:
One tenant's data lives in one database,
One tenant's data lives in one schema, or
Add a tenant_id / account_id column to your tables (shared schema).
I usually find that developers use the following criteria when they evaluate these different approaches.
Isolation: Since you can put each tenant into its own database in one hand, and have tenants share the same table on the other, this becomes the most apparent dimension. If you provide your users raw SQL access or you're in a regulated industry such as healthcare, you may need strict guarantees from your database. That said, PostgreSQL 9.5 comes with row level security policies that makes this less of a concern for most applications.
Extensibility: If your tenants are sharing the same schema (approach #3), and your tenants have fields that varies between them, then you need to think about how to merge these fields.
This article on multi-tenant databases has a great summary of different approaches. For example, you can add a dozen columns, call them C1, C2, and so forth, and have your application infer the actual data in this column based on the tenant_id. PostgresQL 9.4 comes with JSONB support and natively allows you to use semi-structured fields to express variations between different tenants' data.
Scaling: Another criteria is how easily your database would scale-out. If you create a tenant per database or schema (#1 or #2 above), your application can make use of existing Ruby Gems or [Django packages][1] to simplify app integration. That said, you'll need to manually manage your tenants' data and the machines they live on. Similarly, you'll need to build your own sharding logic to propagate foreign key constraints and ALTER TABLE commands.
With approach #3, you can use existing open source scaling solutions, such as Citus. For example, this blog post describes how to easily shard a multi-tenant app with Postgres.
it's time for me to give back to the community :) So after 4 years, our multi-tenant platform is in production and I would like to share the following observations/experiences with all of you.
We used a database per each tenant. This has given us extreme flexibility as the size of the databases in the backups are not huge and hence we can easily import them into our staging environment for customers issues.
We use Liquibase for database development and upgrades. This has been a tremendous help to us, allowing us to package the entire build into a simple war file. All changes are easily versioned and managed very efficiently. There is a bit of learning curve here an there but nothing substantial. 2-5 days can significantly save you time.
Given that we use Spring/JPA/Hibernate, we use a technique called Dynamic Data Source Routing. So when a user logs-in, we find the related datasource with a lookup and connect them to the session to the right database. That's also when the Liquibase scripts get applied for updates.
This is, for now, I will come back with more later on.
Well, there are problems with one database for all tenants in our case for sure.
The backup file gets huge and becomes almost not practical hard to manage
For troubleshooting, we need to restore customer's data in our dev env, we just use that customer's backup file and usually the file is not as big as if we were to use one database for all customers.
Again, Liquibase has been key in allowing to manage updates across all the tenants seamlessly and without any issues. Without Liquibase, I can see lots of complications with this approach. So Liquibase, Liquibase and more Liquibase.
I also suspect that we would need a more powerful hardware to manage a huge database with large joins across millions of records vs much lighter database with much smaller queries.
In case of problems, the service doesn't go down for everyone and there will be limited to one or few tenants.
In general, for our purposes, this has been a great architectural decision and we are benefiting from it every day. One time we had one customer that didn't have their archiving active and their database size grew to over 3 GB. With offshore teams and slower internet as well as storage/bandwidth prices, one can see how things may become complicated very quickly.
Hope this helps someone.
--Rex
Currently, all my collections are maintained in a single database.
I'm a little confused on when I should separate my collections into multiple databases, as some of the collections aren't necessarily related.
multiple databases:
can refine security permissions
separation of concerns
single database
easy
There are a set of tables I access all the time, and a set of tables I access about once a month. It makes some sense to open a persistent connection to a database containing my always-used tables, and open a connection to a database containing the sparsely-used tables when needed.
But is there any performance difference to having all my data in the same database? Is there any general rule-of-thumb to when to use multiple databases (other than production, development, etc.)
Check here for a similar question with some useful, more in-depth answers: Is it better to use multiple databases when you are managing independent sets of things in MongoDB?
As Mysql, sql server, postgre sql etc are basically different implementation of the same concept (rdbms), I am wondering does the same relationship exists between LDAP and MongoDB/CouchDB etc, or is there something more into LDAP?
LDAP
Hierarchical Database model (based on parent/child relationships, like in XML)
LDAP is appropriate for any kind of directory-like information, where fast lookups and less-frequent updates are the norm
Scalable
Standard protocol
Not suited for applications that require data integrity (banking, ecommerce, accounting). Traditionally is used to store users, groups, SSL certificates, service addresses, but is a generic database and can be used for any information.
MongoDb
Document oriented Database, based on BSON (JSON-like) documents
Key value database, but values can be BSON documents
High performance in both read and write operations
Scalable (Master-Slave replication)
Custom protocol
Not suited for applications that require data integrity (banking, ecommerce, accounting)
CouchDb
Document oriented Database, based on JSON documents
Key value database, but values can be JSON documents
High performance in both read and write operations
Scalable (Master-Master replication with conflict resolutions)
REST protocol
Not suited for applications that require data integrity (banking, ecommerce, accounting)
The most important thing, which differs LDAP databases from other noSQL, like MongoDB or CouchDB, is very flexible ACL system.
For example, you can grant access to the object in the tree, using groups and users stored in the same tree. In fact, you can use objects itself to authenticate against the LDAP server.
IMHO, it is completely safe to allow clients to get access to the LDAP tree directly from the Internet without writing a string of code.
In the other hand, LDAP has a bit archaic design and uses sophisticated approaches to provide trivial operations. Mainly because of that fact, I'm slipping and dreaming, about someone implemented LDAP-like ACL in the any of modern noSQL database. Indeed, why making JSON-based database, if you cannot be authorized against it directly from the browser?
SCHEMA is one of the biggest differences.
LDAP data stores have a single system-wide extendable schema (which in real-world, is the the Achilles heel of ldap servers replication...).
NO-SQL has 'no schema' (-or- any schema per object, look at it however you want..).