New to Stack so forgive me if I'm doing something incorrectly here...
We are currently working in postgressql 10, servicing multiple customers, setup using centralized tables partitioned by a customer id. The setup is AWS/EC2, with the traditional ETL approach of each client file going into their own table in an import schema (import.clientA_members) before being standardized and going to prod.members with clientA used as a partition.
Recently, a few clients have brought "Bring your own key" requirements and I am struggling with the best approach to implement this. The setup seems pretty straightforward if we had a database per client but as it stands right now that would cause some pretty big shifts on how we approach everything here from scalability and efficiency to reporting and analytics.
Has anyone here run into this situation? I'm not a DBA, more of an operator, so I'm not sure what the options are in terms of encrypting partitions with different keys, and those keys working well with the AWS Key Management Store.
Thanks in advance!
Related
Please correct me if I am wrong but I guess handling more requests and load by adding more machines or balancing the load between multiple servers is horizontal scaling. So, if I add more servers, how do I distribute the database? Do I create only one database to hold the user records with multiple servers? Or do I split the database too? Then what about database integrity? How to synchronize it? I am a newbie and really confused but eager to learn. I would like to use postgresql for my project and would like to know some basic things before I start. What I want to do is, create two servers for the application load balancing (please correct me if its not what I have to do). How will I manage the database across these without database integrity? Do I have to replicate the data within the two servers? How do you guyz have multiple instance and manage database? Do I need to go through sharding for this? What would be the best approach to have many instance without database integrity in accordance to postgresql. I would really appreciate if anyone could explain it to me. Thank you!
Not sure if you are looking just for a service, which could bring you what you need so you don't have to spend time on it, or if on the other hand, you would like to implement that on your end, which I guess could be quite complex.
If you are looking for your own solution, maybe you should take a look at Postgres-XC, which is a group that provides a database cluster based upon PostgreSQL database.
On the other hand, if you are just interested in the development process and don't want to spend time on this when you can have it on the cloud, maybe you would like to take a look at EnterpriseDB which provides PostgreSQL on the cloud.
For your application, you can also use a cloud service in which you can even auto-scale your app depending on some parameters as it is explained here.
We would like to use OrientDB Graph in an Azure environment. Does anybody has experience using it? We also would like to know if high availability from OrientDB is required under Azure cloud? Azure already offers high availability for Azure storage, Azure Drive and SQL. I understand that they have replications and load balancing built in.
This is super important because we prefer not to get into the business of replications and infrastructure management.
Thanks
So you can spin up 2 or more machines and install OrientDB on them, then configure them together as a distributed cluster. However I haven't been able to find any way that is simpler, easier to do. I am interested in this topic too.
Azure does have features such as geo-replication, which is protects your data against a major data-center incident but doesn't provide any performance benefit and will not make it highly available.
Although pretty reliable, occasionally Microsoft will reboot servers for updates, so to protect against downtime you can use affinity groups so that, of your 2 or more servers, one will always be online. This however does need to be used in conjunction with database replication and ideally load balancing.
It's also worth noting that OrientDB recommends clusters have an odd number of servers as this can prevent conflicts when synchronising data after a communication issue between the servers.
I am using it in amazon and I had to create a java project to monitor http requests inserts and queries. The queries are very fast but takes longer inserting data .
I recommend this type of graph database mode to decrease the time of the queries. Also if you have empty fields OrientDB manages very well compared to other databases .
If you need help with the java project can response to this post and I´ll help u.
I hope it helps. Good luck.
I want to write a high scalable web application for selling event tickets. I want to use NoSQL database, like Big Table or MongoDB and Cloud Service like Google App Engine (GAE) or Amazon Elastic Compute Cloud (Amazon EC2)
Is it posible using this type of database to be sure that two client will not be able to buy a ticket for the same place simultaneously? Or may be I will have to use RDBMS database and forget about Google App Engine?
Things like GAE's datastore can still support transactional semantics, for example:
http://code.google.com/appengine/docs/python/datastore/transactions.html
So yes, it is possible to do what you're seeking to do. (Note - GAE's Datastore is not exactly NoSQL, since it uses SQL-like queries.)
I have a problem with this question. Not all NoSQL databases are created equally, and different NoSQL databases have different ways they store data. Generally the thing you should be worried about are: data is actually written to disk and not just into memory. Most NoSQL databases can do this but not by default. Let's just say this is not a problem, you can usually tell the database like MOngo or Cassandra to write data to disk, can even tell how many servers at minimum the data should be written to.
The problem is that you may not get a true transactional support. When you deal with ecommerce it's important to have all or nothing type of transation where several operations either succeed completely or rolled back. There must be absolutely no chance that only part of your data is saved. For example, if you need to write data to more than one table (collection or document in NoSQL lingo), if server goes down in the middle of the process and your data is only written to one table, that's usually unacceptable in ecommerce.
I am not familiar with all NoSQL databases, but the ones I know don't have this option yet.
MySQL, on the other hand, does.
If transactional support or lack of it does not bother you, then I think its OK to use NoSQL as long as you tell it to save data to disk and not just into memory.
The answer is 'maybe.'
Depending on what you're trying to build, you many be able to use some of the techniques in this post:
http://kylebanker.com/blog/2010/06/07/mongodb-inventory-transactions/
Using something like get_or_insert you can easily ensure that two clients are not receiving the same resource simultaneously on Google App Engine. However, there are big differences between GAE and a RDBMS, so make sure you study them further before you make a decision.
It's obvious that I'm not an expert on Cassandra. So the question may sound silly.
Given an existing SQL-based project does it give any benefit or is it even possible to apply a no-SQL database(e.g. Cassandra) as an additional layer between business logic and SQL database to speed up our queries or inserts.
It's relatively new technology and I'm trying to find its place.
Cassandra will work fine, but if you don't care if you have to rebuild your data memcached will be faster.
But if you want a persistent cache, Cassandra is probably your best option -- reddit started by using Cassandra like this and is working on moving more functionality to it.
I would go with Windows Server AppFabric aka Velocity distributed cache with SQL Server, assuming you are on the .NET platform.
Scott Hanselman has a bunch of posts on AppFabric.
I have developed a number of departmental client-server applications, and am now ready to begin working on moving one of these applications to a SaaS model. I have done some basic web development, but I'm a newbie when it comes to SaaS architectures.
One of the first questions that comes to mind as I try to design the architecture is the question of single vs. multi tenancy. The pros and cons of each vary significantly depending on the type of application and scale required, so I'd like to describe my application and scale needs below, and hope others can comment on how I should get started with the architecture.
The client-server application currently consists of a Firebird database and a Windows application. The database contains about 20 tables containing a few thousand records in 4 primary tables, and a few hundred records in various lookup and related tables. Although the number of records is small, the size can get large, as the database can contain large BLOBS. Each customer sets up their own database and has a handful of users within the organization connected to it. When I update the db schema, a new windows application is released, and it checks the db schema and then applies the updates as needed.
For the SaaS application, I am designing for 100's (not 1000's or millions) of new customers per year. My first thought was to go with a multi tenancy model to make updates easy (shut down apply the updates to one database, and then start up). On the other hand, a single tenancy model would provide a means to roll updates out to a group of customers at a time, and spread the risk of data corruption - i.e. if something goes wrong with a database, it will impact one customer instead of all customers. With this idea, I was thinking of having a single web front-end which would connect to a single customer database upon login. Thus, when a new customer creates an account, a new database would be created (each customer would have their own db with multiple users as needed for the customer).
In this model, a db update would require either a process to go through each db to apply schema changes, or a trigger upon logging in to initiate a schema update similar to the client-server model currently in use.
Can anyone point me to information for similar applications which have been ported from client-server to SaaS? Or provide any pointers to consider? Basically I'm looking for architecture examples of taking a departmental application and making it available as a self service website for multiple customers. Thanks for any suggestions, resources, etc.
Good questions.
One thing that comes to mind is that if you have multiple databases which you roll out in a staged manner to reduce the likelihood of breaking all of your customers, you will have to address the issue of what to do if the db structure changes. You will either have to be very rigorous with respect to maintaining backward compatibility, or else deploy separate versions of your code base and somehow manage which tenants are associated with which databases.
We are providing our application using a SaaS model as well.
It was, initially a Windows app which worked similar to your multiple database proposal. Upon login, the win app would authenticate against a single "licensee" database which would then respond with connection information for a database specific to that licensee. The nice thing about this was that it provided 1) physical separation of licensee data, which our customers liked and 2) enabled us to physically locate the database on a server geographically closer to the users which both improves performance and avoids some potentially tricky legal and regulatory issues with respect to providing data across country boundaries.
Of course, since the app was a thick client app, we could get away with making database changes and pushing them out to one licensee at a time. When we were ready to upgrade, we could push out an updated thick client in conjunction with the new database - thereby ensuring that the codebase was a match with the database. As long as the common "licensee" authentication database stayed consistent, this worked fairly well.
On the other hand, though, this solution brought with it all of the problems of maintaining and managing a thick client approach which finally lead us down the thing client, browser-based approach.
In our new model, everything is in a single database. When we have updates, we push both the code and the db out at the same time. This solves the problem of keeping the code base consistent with the database structure. However, we are now confronted with the issues mentioned in #s 1 and 2, above, which we have yet to resolve.
I hope this provides some food for thought for you.
I, too, am interested in this question.
Thanks for the post.
-S