Multiple server instance load and database balancing - postgresql

Please correct me if I am wrong but I guess handling more requests and load by adding more machines or balancing the load between multiple servers is horizontal scaling. So, if I add more servers, how do I distribute the database? Do I create only one database to hold the user records with multiple servers? Or do I split the database too? Then what about database integrity? How to synchronize it? I am a newbie and really confused but eager to learn. I would like to use postgresql for my project and would like to know some basic things before I start. What I want to do is, create two servers for the application load balancing (please correct me if its not what I have to do). How will I manage the database across these without database integrity? Do I have to replicate the data within the two servers? How do you guyz have multiple instance and manage database? Do I need to go through sharding for this? What would be the best approach to have many instance without database integrity in accordance to postgresql. I would really appreciate if anyone could explain it to me. Thank you!

Not sure if you are looking just for a service, which could bring you what you need so you don't have to spend time on it, or if on the other hand, you would like to implement that on your end, which I guess could be quite complex.
If you are looking for your own solution, maybe you should take a look at Postgres-XC, which is a group that provides a database cluster based upon PostgreSQL database.
On the other hand, if you are just interested in the development process and don't want to spend time on this when you can have it on the cloud, maybe you would like to take a look at EnterpriseDB which provides PostgreSQL on the cloud.
For your application, you can also use a cloud service in which you can even auto-scale your app depending on some parameters as it is explained here.

Related

Making DB Query as a Microservice API

We are currently using direct DB connection to query mongodb from our scripts and retrieve the required data.
Is it advisable / best practice to make the data retrieval from DB as a microservice.
It does until it doesn't :)
A service needs to get its data from somewhere and a database is a good start. If you have high loads you may find that you need to add a cache in the middle see this post from Instagram engineering https://instagram-engineering.com/thundering-herds-promises-82191c8af57d
edit (after comment)
generally speaking, a service should own its database and other services shouldn't access another database service directly only via its API. The idea is to keep services autonomous and enable them to evolve independently.
Depending on the size of microservice, that's now always practical since it can make the overhead of having the service be more of the utility it provide (I call this nanoservices). Also, if you have a lot of services you don't want to allow each one to talk to any other (even not via the DB) since you just get a huge mess. The way I see it there should be clear logical boundaries (services or microservices) and then within each such logical services you may find that it makes sense to have more than one "parts" (which I call aspects) e.g. they have different scaling needs or different suitable technologies etc. When you set things this way aspects can access the same database and services shouldn't (and you can still tame the chaos :) )
One last thing to think about - who said API is only a REST API, you can add views on top of the data that belongs to another service and as long as you treat that like an API (security, versioning etc.) you can have other services access that as well

How to have complete offline functionality in a web app with PostgreSQL database?

I would like to give a web app with a PostgreSQL database 100% offline functionality. In an ideal case the database should be completely replicated in the browser per user, and synchronized when online. So that the same code can be used to talk to both the offline and online database. I know this is possible with PouchDB and CouchDB, but have not found a solution that works with PostgreSQL. Is this at all possible?
Short answer: I don't know of anything like this that currently exists.
However, in theory, this could be made to work...(long answer:)
Write a PostgreSQL backend for levelup (one exists for MySQL: https://github.com/kesla/mysqldown)
Wire up pouch-server to read/write from your PostgreSQL db using pouchdb's existing leveldb adapter (which in turn will have to be configured to use your postgres backend). Congrats, you can now sync data using PouchDB!
Whether an approach like this is practical in reality for your application is a different question you'll have to answer.
You may be wondering, for example, "will I be able to sync an existing complex schema with multiple tables to the client with this approach?" The answer is probably not - the mysqldown implementation of leveldown uses a single MySQL table with three fields: id, key, and value (source), and I imagine any general-purpose PostgreSQL adapter would be similar (nothing says you can't do a special-purpose adapter just for your app though!).
On the other hand, if you were to implement a couchdb-compatible API (or a subset- you may not need attachments, for example) over your existing database schema, there's nothing stopping you from using PouchDB on the client to talk directly to that as if it were an actual CouchDB - just pop in the URL and call replicate()! Implementing the replication protocol might be a fair bit of work, since you'd need to track revisions and so on somewhere - but again, technically not impossible!
There are also implementations of levelup's backend storage that are designed for browsers. See level.js, which could be another way to sync between a server-side Postgres levelup backend and the browser.
TL;DR: There's tons of work being done around Javascript databases right now. Is syncing with Postgres impossible? probably not. Would it be a lot of work? Definitely. Worth it? Who knows, but it would be cool.
Without installing PostgreSQL on the client? No. Obviously you can cache data for offline use, but an entire RDBMS+procedural languages in Javscript, no.

OrientDB in Azure

We would like to use OrientDB Graph in an Azure environment. Does anybody has experience using it? We also would like to know if high availability from OrientDB is required under Azure cloud? Azure already offers high availability for Azure storage, Azure Drive and SQL. I understand that they have replications and load balancing built in.
This is super important because we prefer not to get into the business of replications and infrastructure management.
Thanks
So you can spin up 2 or more machines and install OrientDB on them, then configure them together as a distributed cluster. However I haven't been able to find any way that is simpler, easier to do. I am interested in this topic too.
Azure does have features such as geo-replication, which is protects your data against a major data-center incident but doesn't provide any performance benefit and will not make it highly available.
Although pretty reliable, occasionally Microsoft will reboot servers for updates, so to protect against downtime you can use affinity groups so that, of your 2 or more servers, one will always be online. This however does need to be used in conjunction with database replication and ideally load balancing.
It's also worth noting that OrientDB recommends clusters have an odd number of servers as this can prevent conflicts when synchronising data after a communication issue between the servers.
I am using it in amazon and I had to create a java project to monitor http requests inserts and queries. The queries are very fast but takes longer inserting data .
I recommend this type of graph database mode to decrease the time of the queries. Also if you have empty fields OrientDB manages very well compared to other databases .
If you need help with the java project can response to this post and I´ll help u.
I hope it helps. Good luck.

Is NoSQL suitable for Selling Tickets Web Application?

I want to write a high scalable web application for selling event tickets. I want to use NoSQL database, like Big Table or MongoDB and Cloud Service like Google App Engine (GAE) or Amazon Elastic Compute Cloud (Amazon EC2)
Is it posible using this type of database to be sure that two client will not be able to buy a ticket for the same place simultaneously? Or may be I will have to use RDBMS database and forget about Google App Engine?
Things like GAE's datastore can still support transactional semantics, for example:
http://code.google.com/appengine/docs/python/datastore/transactions.html
So yes, it is possible to do what you're seeking to do. (Note - GAE's Datastore is not exactly NoSQL, since it uses SQL-like queries.)
I have a problem with this question. Not all NoSQL databases are created equally, and different NoSQL databases have different ways they store data. Generally the thing you should be worried about are: data is actually written to disk and not just into memory. Most NoSQL databases can do this but not by default. Let's just say this is not a problem, you can usually tell the database like MOngo or Cassandra to write data to disk, can even tell how many servers at minimum the data should be written to.
The problem is that you may not get a true transactional support. When you deal with ecommerce it's important to have all or nothing type of transation where several operations either succeed completely or rolled back. There must be absolutely no chance that only part of your data is saved. For example, if you need to write data to more than one table (collection or document in NoSQL lingo), if server goes down in the middle of the process and your data is only written to one table, that's usually unacceptable in ecommerce.
I am not familiar with all NoSQL databases, but the ones I know don't have this option yet.
MySQL, on the other hand, does.
If transactional support or lack of it does not bother you, then I think its OK to use NoSQL as long as you tell it to save data to disk and not just into memory.
The answer is 'maybe.'
Depending on what you're trying to build, you many be able to use some of the techniques in this post:
http://kylebanker.com/blog/2010/06/07/mongodb-inventory-transactions/
Using something like get_or_insert you can easily ensure that two clients are not receiving the same resource simultaneously on Google App Engine. However, there are big differences between GAE and a RDBMS, so make sure you study them further before you make a decision.

Strategies for "Always-Connected" Windows Client Data Architecture

Let me start by saying: this is my 1st post here, this is a bit lenghty, and I havent done Windows Forms development in years....with that in mind please excuse me if this isn't directly a programming question and please bear with me as I really need the help!!
I have been asked to develop a Windows Forms app for our company that talks to a central (local area network) Linux Server hosting a PostgreSQL database. The app is to allow users to authenticate themselves into the system and thereafter conduct the usual transactions with the PG database. Ordinarily, I would propose writing a webforms app against Mono, but the clients need to utilise local resources such as USB peripheral devices, so that is out of the question. While it might not seem clear, my questions are italised below:
Dilemma #1:
The application is meant to be always connected. How should I structure my DAL/BLL - Should this reside on the server or with the client?
Dilemma #2:
I have been reading up on Client Application Services (CAS), and it seems like a great fit for authentication, as everything is exposed via URIs. I know that a .NET Data Provider exists for PostgreSQL, but not too sure if CAS will all work on a Linux (Debian) server? Believe me, I would get my hands dirty and try myself, but I need to come up with a logical design first before resources are allocated to me for "trial purposes"!
Dilemma #3:
If the DAL/BLL is to reside on the server, is there any way I can create data services, and expose only these services to authenticated clients. There is a (security) requirement whereby a connection string with username and password to the database cannot be present on any client machines...even if security on the database side is quite rigid. I'm guessing that the only way for this to work would be to create the various CRUD data service methods that are exposed by an ASP.NET app, and have the WindowsForms make a request for data or persist data to the ASP.NET app (thru a URI) and have that return a resultset or value. Would I be correct in assuming this? Should I be looking into WCF Data Services? and will WCF work with a non-SQL Server database?
Thank you for taking the time out to read this, but know that I am desperately seeking any advice on this! THANKS A MILLION!!!!
EDIT:
I am considering also using NHibernate as my ORM
Some parts of your questions are complicated and beyond my expertise. However, in general you can do almost anything you put effort into, CAP theorem and the like aside.
DAL/BLL stuff in general can reside in any of the tiers. I put a lot of this in my database and some in the middle tier, however this is to allow re-use in different environments which may or may not be a goal for you. The thing is I would think through carefully the separation of concerns issues here and what sorts of centralization of logic you want to place. The further back, the more re-usable this becomes but this is not always a free tradeoff.
I am not entirely familiar with CAS but it looked like AJAX kinds of stuff from what I saw on the MSDN web site. That could be wrong, but if it is right, then you have an issue in that such requests may be stateless and this could be an issue if you need a constant connection.
On the whole based on what you are saying it sounds cleanest to do a two tier rather than a three tier app, and have the DAL/BLL sit on the client, possibly supported by stored procedures in the server. You can then set PostgreSQL up to authenticate against whatever you use on your network (KRB5 if AD is what I would recommend). This simplifies your data access, and it allows you to control permissions based on the authentication against the database. Since you can authenticate users based on AD, you can then set permissions accordingly.
One important consideration is going to be number of connections. PostgreSQL does have some places where every current connection must be checked and iterated through, and connection startup and tear-down overhead in some cases can be significant. So one important decision will involve connection pooling. Whether or not you use connection pooling to boost performance will depend on what you are doing but I have seen cases where PostgreSQL has handled 600 connections without serious problems.