As far as I know database connectivity technologies like the entity framework open and close connections automatically to enhance scalability. (Managing Connections and Transactions)
For example a form using asp.net mvc and the entity framework will connect to retrieve a record and them will immediately disconnect and remains disconnected until I modify the data in the controls and save it.
I wonder if the same behavior applies for an access 2013 form linked via odbc to SQL Server. Once a record is retrieved, is the connection closed until my next operation or the connection remains open until I close the form? Is the behavior configurable?
The fact or existence of a connection does not change nor increase scalability for typical applications. So if you have 10, or 1000 connections, and those connections are NOT doing anything, then SQL server not doing any work, and hence no increase in scalability will occur in these typical cases.
And OFTEN there is additional chatter over the network to open the connection pull the data, close the connection.
Then when you write the data back you AGAIN have 3 steps. So you again open the connection, open the table, write the data, and then close the table!
In fact keeping the connection open means you don’t waste network bandwidth opening and closing the connection!
The MAIN reason for disconnected datasets is that such connections work far more reliable in the case when you have a poor or less than ideal connection (such as over the internet or via Wi-Fi at a coffee shop). I these cases, if the open connection command fails, then the connection does NOT occur, and you don’t pull any data. And if a bit of time delay or re-try occurs as the connection is re-attempted, then no big deal. So you grab that data and close the connection.
However, this opening, and then closing often as noted causes additional overhead. However, given how the internet works (as opposed to a typical office network), then this disconnected approach is much the norm for pulling data over the internet, or when using something like Wi-Fi. So the approach is one of expecting that a minor disconnect will and can occur.
The second “common” reason for this 3 step process is other development platforms “promote” the use of disconnected data because the forms are NOT bound to the actual data tables (or bound to a query). The downside of this disconnected approach is you thus in general have to write code to pull the data down to the client, and THEN render the data from the recordset object to the form. The result is a TON of additional work to edit data in a form. So expect the typical asp .net application to cost 5 or even 10 times as much as writing that application in Access.
In the case of Access bound form model, it eliminates the developer having to code the data pull and coding and eliminates the need of the developer to pull that data into some object, and then close that connection. Once Access establishes a connection to the SQL server, then that connection remains open until you shut down the Access application.
The keep the connection open and active has the advantage of rapid application development due to the bound forms model. So you DO NOT need to write code to pull data from the server and THEN transfer that data from some type of object into the form.
So the downside to the Access approach has little (if anything) to do with scalability. The downside is a simple break in the connection is NOT at all well handled by Access.
So if you build a form in Access that is bound to a SQL server table of say 1 million records, and you launch that form with a where clause of InvoiceNumer = 12356, then Access is smart, and ONLY pulls down the ONE record from SQL server. So in terms of scalability and performance, the use of disconnected system as opposed to the bound connected model in Access will not result in a performance difference.
However, because Access keeps that connection open, then any breakage in that connection will result in an ODBC error. Worse is Access is NOT able to recover from such errors when using bound forms – your only recourse is to restart Access.
It is certainly possible to build un-bound forms in Access, but this is NOT how Access was designed. In fact if one is going to adopt a disconnected data model in Access then Access is the wrong tool due to no wizards or “developer aids” for such an approach. So say .net has wizards built around the disconnected system, and Access has tools built around the connected system. And a LOT of functionally that makes Access such a great rapid application development tool is lost if you build un-bound forms in Access. So bound Access forms have MANY additional events that you not find in say .net forms.
So it not scalability or better performance that is lost by the bound forms approach in Access, but simple ease of development is the main feature and gain that the Access development approach results in.
A developer will STILL need and should LIMIT the number of records pulled into a form (by use of the forms “where” clause.
(so .net forms don’t have such indulgences as a where clause).
So the major shortcoming is that Access does not recover from ODBC disconnections since it was designed to keep such connections open.
Related
I apologise if the question is naive. I wanted to understand what could be a few possible use cases of the live query feature.
Let's say - My database state changes but it doesn't change every minute (or hour). If I execute a live query against my database/class/cluster, I'm not really expecting the callback to be called anytime soon. But, hey, I would still want to be notified when there's a state change.
My need with Orientdb is more on lines of ElasticSearch's percolator bundled with a publish-subscribe system.
Is live query meant to cater to such use cases too? Or is my understanding of live query very limited? What could be a few possible use cases for the live query feature?
Thanks!
Whether or not Live Queries will be appropriate for your use case depends on a few things. There are several reason why live queries make sense. A few questions to ask are:
How frequently does the data change?
How soon after the data changes do you need to know about it?
How many different groups of data (e.g. classes, clusters) do you need to deal with?
How many clients are connected to the server?
If the data does not change very often, or if you can wait a set period of time before an update, or you don't have many clients (hitting the DB directly), or if you only have one thing feeding the database, then you might want to just do polling. There is a balance between holding a connection open that you send a message on very infrequently (live queries) and polling too often.
For example. It's possible that you have an application server (tomcat, node, etc) and that your clients connect via web sockets. Now lets say your app server makes one (or a few pooled) live query to the database. Now lets say your database has an update. It might just go from the database to the app server (e.g. node). Node may now be responsible for fanning out that message across 100 web sockets (1 for each connected client). In this case, the fact that node is connected to the database in a persistent way with a live query open, is not that big of a deal.
The question is. If you have thousands of clients connected, do they all need an immediate update. If so are you planning on having them polling at a short interval? If so, you probably could benefit from a live query. Lots of clients polling at a short interval will generate a lot of unnecessary traffic and queries.
Unfortunately at the end of the day, the answer is it depends. You probably need to prototype and then instrument under load to see what your tradeoffs are. But in principal, it is less about how frequently updates come, and more about how often you would have clients poll, and how many clients you have. If the answer is "short intervals and a lot of clients" Give live queries a try.
We are evaluating different alternatives for multi-tenancy in our platform. We think that one database per customer is the way to go as data structure and requirements are completely different from one customer to another, and we want to keep them as isolated as possible.
However we are facing the question of how to manage the connection to multiple databases. We don't want to have one app instance per customer. Instead we want to have a pool of app instances handling requests for all our customers and use the correct database depending on the customer.
Our concern is if keeping connections open to many (maybe thousands) of database will cause a performance issue. We are actually worried about memory usage, so we are wondering what's the overhead on client side when performing a connection to the MongoDB server.
Also we are thinking about moving the database access to a different service, which is going to be responsible of handling the database connection for all customers. In this case, is there an existing tool that allows to do that kind of "multiplexing" of MongoDB databases?
Some additional notes:
We discarded sharding. It won't fit our needs. We need different databases.
Databases will be in different servers with reserved resources. This means all databases run its own mondod process and we need different connections.
We use Java driver.
I'm considering how to design a mechanism for replicating a (potentially large) MongoDB or other NoSQL (CouchDB, etc) database to dozens of clients at once. The clients would function like a replica set, but the replication would be one-way and the remote clients would belong to other parties. Specifically, I am looking for the following features:
real-time: changes to the master database should be pushed out to the clients as quickly as possible
replication to new clients: a new client must be able to connect, automatically sync the majority of existing data, then receive real-time updates.
efficient: both the initial synchronization/transfer of data and tracking of real-time updates ("diffs", if you will) are computationally efficient, with multiple clients connected.
secure: the master database presents an interface to which remote clients (who do not belong to the same owner or system) can connect: i.e., we cannot just add all the clients to the master's replica set.
robust: a temporarily connection failure between a client and the master database should be easily and efficiently recoverable.
In some sense, the server is publishing a collection of data and the clients are subscribing to it. I realize that this is a hard software engineering problem, and to my knowledge no piece of software has implemented this exactly yet. However, some approaches have come to mind as close, which I'll list below.
Meteor's DDP protocol: It's designed to do this with Mongo-like collections and exactly implements the model of publishing and subscribing to a set of data (rather than a stream of messages). It manages the initial sync and sends along live changes. However, it's still in development, and far from being an industrial-strength solutions - current drawbacks are that the server keeps a copy of every client's state in a possibly inefficient way and is only tested on collections that can fit in the memory of a web app. Also, it appears that DDP cannot efficiently sync an out-of-date database without fetching everything from scratch. If anyone can point to some examples of how large of a collection can be synced over DDP, that would be great. (See also: https://stackoverflow.com/q/10128430/586086)
Broadcasting the Mongo oplog: Using a high-throughput message bus like Apache Kafka, one may be able to efficiently send the oplog to many clients at once. This tackles some of the system implementation challenges. However, this requires that the clients start with an initial sync that gets them close enough to the current master state somehow and then start replaying the oplog from the appropriate point.
Continuous replication a la CouchDB: I'm not sure how this is implemented and how robust it is, given the sparsity of the documentation. However, it does seem to work over remote database connections. How efficient is this, though, when multiple clients are trying to replicate at the same time? (A similar hack to this would be to make the clients MongoDB Priority 0 replica set members; however, that seems to be far from its intended use. See also: http://guide.couchdb.org/draft/replication.html)
Please give pointers to software or pieces of software that already implement parts of this, or suggestions on the algorithms/data structures needed to do this efficiently.
If you are looking specifically for real-time replication, I'd recommend you look into SaaS offerings specifically for this purpose, such as https://www.firebase.com/
i am going to have a website with 20k+ concurrent users.
i am going to use mongodb using one management node and 3 or more nodes for data sharding.
now my problem is maximum connections. if i have that many users accessing the database, how can i make sure they don't reach the maximum limit? also do i have to change anything maybe on the kernel to increase the connections?
basically the database will be used to keep hold of connected users to the site, so there are going to be heavy read/write operations.
thank you in advance.
You don't want to open a new database connection each time a new user connects. I don't know if you'll be able to scale to 20k+ concurrent users easily, since MongoDB uses a new thread for each new connection. You want your web app backend to have just one to a few database connections open and just use those in a pool, particularly since web usage is very asynchronous and event driven.
see: http://www.mongodb.org/display/DOCS/Connections
The server will use one thread per TCP
connection, therefore it is highly recomended that your application
use some sort of connection pooling. Luckily, most drivers handle this
for you behind the scenes. One notable exception is setups where your
app spawns a new process for each request, such as CGI and some
configurations of PHP.
Whatever driver you're using, you'll have to find out how they handle connections and if they pool or not. For instance, Node's Mongoose is non-blocking and so you use one connection per app usually. This is the kind of thing you probably want.
I'm in the planning stages of a .NET service which continually processes incoming messages, which involves various transformations, database inserts and updates, etc. As a whole, the service is huge and complicated, but the individual tasks it performs are small, simple, and well-defined.
For this reason, and in order to allow for easy expansion in future, I want to split the service into several smaller services which basically perform part of the processing before passing it onto the next service in the chain.
In order to achieve this, I need some kind of intermediary messaging system that will pass messages from one service to another. I want this to happen in such a way that if a link in the chain crashing or is taken offline briefly, the messages will begin to queue up and get processed once the destination comes back online.
I've always used message queuing for this type of thing, but have recently been made aware of SQL Service Broker which appears to do something similar. Is SQLSB a viable alternative for this scenario and, if so, would I see any performance benefits by using that instead of standard Message Queuing?
Thanks
It sounds to me like you may be after a service bus architecture. This would provide you with the coordination and fault tolerance you are looking for. I'm most familiar and partial to NServiceBus, but there are others including Mass Transit and Rhino Service Bus.
If most of these steps initiate from a database state and end up in a database update, then merging your message storage with your data storage makes a lot of sense:
a single product to backup/restore
consistent state backups
a single high-availability/disaster recoverability solution (DB mirroring, clustering, log shipping etc)
database scale storage (IO capabilities, size and capacity limitations etc as per the database product characteristics, not the limits of message store products).
a single product to tune, troubleshoot, administer
In addition there are also serious performance considerations, as having your message store be the same as the data store means you are not required to do two-phase commit on every message interaction. Using a separate message store requires you to enroll the message store and the data store in a distributed transaction (even if is on the same machine) which requires two-phase commit and is much slower than the single-phase commit of database alone transactions.
In addition using a message store in the database as opposed to an external one has advantages like queryability (run SELECT over the message queues).
Now if we translate the abstract terms 'message store in the database as being Service Broker and 'non-database message store' as being MSMQ, you can see my point why SSB will run circles any time around MSMQ.
My recent experiences with both approaches (starting with Sql Server Service Broker) led me to the situation in which I cry for getting my messages out of SQL server. The problem is quasi-political but you might want to consider it: SQL server in my organisation is managed by a specialized DBA while application servers (i.e. messaging like NServiceBus) by developers and network team. Any change to database servers requires painful performance analysis from DBA and is immersed in fear that we might get standard SQL responsibilities down by our queuing engine living in the same space.
SSSB is pretty difficult to manage (not unlike messaging middleware) but the difference is that I am more allowed to screw something up in the messaging world (the worst that may happen is some pile of messages building up somewhere and logs filling up) and I can't afford for any mistakes in SQL world, where customer transactional data live and is vital for business (including data from legacy systems). I really don't want to get those 'unexpected database growth' or 'wait time alert' or 'why is my temp db growing without end' emails anymore.
I've learned that application servers are cheap. Just add message handlers, add machines... easy. Virtually no license costs. With SQL server it is exactly opposite. It now appears to me that using Service Broker for messaging is like using an expensive car to plow potato field. It is much better for other things.