So, I've been researching for quite some time now, but it seems that all the results I'm getting are fairly outdated. I'd like to know what the current state of object-oriented database management systems (OODBMS) are in relation to fairly large games, such as an MMORPG, especially when comparing them to relational database management systems (RDBMS).
Obviously, the data model would depend on the data in question (i.e. the players vs. the world state, such as currencies and such).
So what are your thoughts? What are the current advantages of OODBMSs vs. RDBMSs? Have RDBMSs overcome some of the struggles that were evident a decade ago, such as scalability, performance, and cost?
Let's start with a general discussion of relational databases vs. object databases and then we can delve into the specifics of using an object database for game development.
Disclaimer: I work for Objectivity, Inc., where we build and sell a massively scalable object/graph database to our customers.
First, as you've observed, object databases do not have anywhere near the market adoption as relational databases, in general or in game development. This is true for a number of reasons, some technical and some non-technical but you could boil it down to a "good enough" and/or "cheap enough" explanation. Relational database are "good enough" for the vast majority of problems the people need to solve on a daily basis, and there are many free open-source relational databases available.
At Objectivity, we like to say that we have two kinds of customers: The brilliant and the desperate. Our brilliant customers understand that their data will be too complex to put into a relational database. It may be that their data model includes type inheritances that are difficult to express in a relational database product or that the performance degradation of an object-relational mapping layer is unacceptable. With a big enough hammer, you can always pound a square peg into a round hole, but at the end of the day, if performance matters, our brilliant customers understand that they need to choose a technology that aligns with their data model and performance requirements.
Our desperate customers are usually 2nd generation developers, people who show up to fix a project after the first team got tired of trying to pound the square peg into the round hole. These people are looking for solutions that provide high-speed data persistence, navigational queries, and potentially distributed databases to solve mission-critical problems.
The vast majority of our customers use our object database for four reasons:
The database allows them to easily model complex data.
The database allows them to link objects together and then navigate those connections orders of magnitude faster than relational databases.
The database is distributable and massively scalable.
The database is incredibly fast for both ingest and query.
Today we are seeing the dramatic emergence of graph database. I believe that this due to the growing importance of the relationships between data items in databases. Relational database are not called relational because of their ability to efficiently handle relationships, but because they are designed to support relational algebra. Object and graph database are specifically designed to support relationships between objects and are typically much faster at navigating across dozens or hundreds of nodes and edges (relationships) in an appropriately designed object/graph database.
Now let's discuss the applicability of object databases to game development...
There's absolutely no reason why a object database wouldn't be great for game development. Objectivity got its start as the database inside of an industrial CAD/CAM/CAE software architecture. It was the first time all three domains, CAD, CAM, and CAE were all seamlessly joined together using the same database. Objectivity has been and is still being used to run complex simulations on 3-D CAD models. Creating a multi-dimensional game world in an object database would be much more natural than trying to accomplish the same thing in a relational database. So yes, it would be possible to build a MMORPG using a graph database. You could start by creating the game universe in the database. Then you manage all of the players and their interactions as event updates from the game engine to the database. Our product has a memory caching feature where you only keep in memory those segments of the database that you are currently working on. In an MMORPG, you could have many different servers all hosting different parts of the game universe and they'd all be running unimpeded against different parts of the database. This what some of our industrial customers do now.
Related
I'm developing a brand new project in Scala. It's just an application for a bunch of CRUD operations, however, because of some eccentric requirements, Play2 or Lift does not fit the bill, so I'm going to develop the application from the ground up. This means that Anorm or ScalaQuery becomes less obvious choices for database integration, and leaves me with the question: is it time to try something new?
My past technology stacks mostly included Java and PostgreSQL and I have experience with both ORM and plain SQL. Are NoSQL database management systems like MongoDB a good replacement for a typical RDBMS or are they special case application data stores? Also, how does the choice of database effect the greater Scala system design (if at all)? For example, the fact that you are using a JSON-like interface to talk to the database, and JSON between the web and a REST service, does not mean that much if everything in the middle becomes Scala objects, or does it?
I'm basically asking for someone's experience on moving from relational to object/document type databases, using Scala in particular. I know that good RDBMS integration is promised in the upcoming release of SLICK. So, if a company like TypeSafe decides to make a RDBMS integration part of the TypeSafe stack, then will I be swimming upstream by integrating to MongoDB using Casbah for example?
Apologies if this question appears a bit vague. I do hope that someone with the right insights or experience will be able to help though.
Update:
Apologies for not adding links to SLICK (it being fairly new). Here goes:
Quick overview
Project home
Update 2:
My personal first win for a technology is usually developer productivity - this translates to lightweight and simple: quick to learn, easy to maintain, no magic
I am currently in a similar situation, and since I have some experience with web development and SQL databases, I took it as an opportunity to work with MongoDB, Cashbah (and Scalatra). My experience is still very limited and the project and the amount of data I am working with is pretty small, but here are a few observations I've made.
For the few sets of data I have, performance does not seem to motivate either SQL or NoSQL. However, performance in the presence of huge amounts of data is often listed as a reason for using NoSQL, e.g., by Wikipedia
My documents (entries in the database) arise from benchmarking test suits, and mainly have a static structure, and I am optimistic that I could store them in a fixed-schema SQL database. However, a few substructures are not static, e.g., new test cases are added, new statistics are tracked, others are removed. This was my main motivation for trying a schema-free NoSQL database. Also, because I had the feeling that the document approach of MongoDB makes it much more obvious which data belongs together (i.e., to a document), in contrast to entries in a relational database, where the data would be distributed over various tables and rows, and where a full "document" would need to be reconstructed by joins.
Tools such as Lift-Json or Rogue allow you to work with regular Scala objects in a type-safe, although the data is regularly (de-)serialised as (from) JSON. However, this naturally works best if the structure of your data is mainly static, otherwise, you you are left with using strings to access your data (e.g., for expanding the results of a query using Cashbah).
If you are mainly concerned about a coherent representation of data on server and client side, languages such as Opa or Haxe might be of interest, since they compile to code that can executed on both sides. See this page for "multitarget" or "tierless" languages.
Got too long for a comment. Was just trying to relate my short experience with Scala (about 6 months now, since about when Play2 came out--it's quickly become my go to language).
I've enjoyed using Salat/Casbah with MongoDB in my last few projects; most have been in Play2, but the latest was without a webapp framework. It definitely hasn't felt like swimming upstream.
I would say that there are particular use cases for which I wouldn't use mongo, but it works nicely as a general purpose object data store, especially if you expect to query by id or index and don't need transactions (and will need minimal ad-hoc aggregation type stuff).
Expect to require a separate set of servers dedicated to mongodb (or to use a service dedicated to mongodb), but I guess that's normal for most serious database apps.
I've also used Play2/Anorm, which was surprisingly enjoyable to use for some ad-hoc query dashboard-style report pages. I started trying to go the Squeryl route, but Anorm seemed easier to use for one-off aggregation queries. Haven't looked at SLICK, but it sounds interesting.
It's really hard to say without knowing what problems you would like the app to solve.
I've personally found my productivity increased using NoSQL DBs via REST/JSON. Though bear in mind most NoSQL DBs offer REST interfaces which preclude the need for much middleware, Scala or otherwise, unless you intend to write a webapp with a UI.
If this is a learning exercise, I recommend you try multiple things out, as each NoSQL DB has something different to offer to your toolkit, and have personally found CouchDB, Riak, Neo4j, and MongoDb all with various pluses and drawbacks and good for different purposes.
Hope this helps, good luck.
Membase is great for social game due to it's low latency.
As I understand CouchDB is a MVCC system using b+ tree, with a focus on append only design.
(http://guide.couchdb.org/draft/btree.html)
One of the most important scenario of Membase is social game.
Social game has a lot of write operations (50+%).
And a good portion of them are in-place updates.
So why is CouchDB a suitable persistent layer for Membase?
I'd also add that CouchDB's append-only log format really doesn't have much relation to whether application writes are new items or updates. The append-only format gives us much better reliability and performance than an in-place system (like sqlite...which is still quite reliable). It's also much easier to take backups of.
Does Membase NEED an append-only log format? maybe not...does it NEED CouchDB?...YES!
The benefits of map-reduce and indexing as well as eventually consistent replication that CouchDB brings are nothing less than huge for Membase...and the benefits of low-latency, clustering and UI that Membase brings to CouchDB are arguably just as important.
(Disclosure: I work for Couchbase)
Perry Krug
CouchDB has great file formats, great ability to recover from crashes, sophisticated authentication and authorization tools, and a universal, standard, interface: HTTP. CouchDB is poor at low-latency queries, optimized memory utilization, and heavy update speeds (a million per second).
Membase currently has only a simple SQLite file format for persistence, less sophisticated authentication and authorization, using a more obscure protocol. Membase is amazing for low-latency queries, ideal memory utilization, and heavy update speeds.
I think the two complement each other very well. Since the merging effort is coming from core developers in both projects, collaborating together, I expect to see the strengths of both and the weaknesses of neither. Yes, CouchDB is a good persistence layer for Membase.
Money speaks and if there ever was a vote of confidence then here it is, not only from a new lead investor but also from the existing ones as well.
http://www.couchbase.com/press-releases/couchbase-series-C
Besides, don't you think that Membase itself is more than well enough qualified to make an evaluation for such a merger decision?
I searched a bit for the differences between OODBMS and RDBMS. I pretty much know what they are. However, how I will decide which one is better for which applications. Can anyone kindly help me please?
What I meant for factory management is: there are production lines to manufacture bottled, frozen and other food stuff. The application manages from assigning staff onto the lines, to keep the production records in the system. Which dbms is better for such systems?
Thanks in advance.
Here is a nice article by Rick Grehan that describes cases where ODBMS are useful:
http://www.odbms.org/wp-content/uploads/2013/11/006.04-Grehan-When-to-Use-an-ODBMS-2005.pdf
Disclaimer: this is an "old curmudgeon" answer, from a guy who wrote plenty of perfectly functional accounting, manufacturing and other code before OOP came into the mainstream.
With that being said...
Factory management is classic relational database stuff, it's what it was invented to do. The code for classic relational apps tends to follow very predictable patterns, lots of loops over retrieved rows from tables, or straight pass-through stuff: passing data up to the UI or down to the database. If your DB is well-designed, the biz logic you code will be details in those loops or pass-throughs, but those two patterns will dominate.
An OODMS on the other hand, from the point of view of this "old curmudgeon", attempts to recast the perfectly and efficiently functional RDBMS into something that will work with classes/objects, for no discernible gain over a system that has proven itself for decades to work extremely well. Classes have little or nothing to do with the classic code patterns that sit on top of relational databases. In fact, they tend to complicate things and can easily get in the way. I'm not saying don't use OOP code to deal with the database, just that OOP was invented for a different kind of problem, a problem that database apps don't happen to have.
Decision to choose OODBMS or RDBMS does not depend upon particular application like factory management/automation.
It is depends on many criteria like
1) Programming Paradigm - If you [programmer] choose to visualize/implement in the OO programming language then the OODBMS is suitable to store the objects as directly into the database, but Most widely the type of DBMS Relational, because it is well established commercially and have a good mathematical background.
2) Application Specific - For an factory automation/management, responsiveness and fast access is important. The OODBMS are swifter than the RDBMS. If you considered for web-development then a light-weight tool like MySQL will fit a lot.
3) Trend - Now there is paradigm swift from Legacy/Structural to Object/Component Oriented programming. so therefore, in this trend the OODBMS is best suitable for the Enterprise Applications like factory management, etc.
It depends on the application layer using. If it is a simple approach more towards procedural way [which can have classes too] RDBMS is more suitable. Otherwise if you are more towards a strict object oriented system OODBMS can be used.
I usually draw the line of usefulness at the point in which they need to be integrated into enterprise systems. If your project does not necessarily need to integrate, ODBMS is usually easier or technically superior. If you can integrate via web services or "push/pull" into an enterprise system DB, then you can still use an ODBMS, but there might be political pressure against it. (newer ODBMS/RDBMS replication like dRS for db4o may be a good fit) But if you need tight integration with legacy or enterprise datastores, then you're usually forced to use the RDBMS for one reason or another.
That said, your individual production lines might benefit greatly from an ODBMS which are great at storing oft-changing complex object models and schemas while the orchestrator system could follow my previous line of thinking.
I've been using ODBMS for many years, and have been dreading the project which requires me to return to purely relational data management. Although recent improvements in ORM tooling have made relational much more pleasant to work with, the ORM+RDBMS solution still can't keep up with ODBMS systems in a few key areas (see the previously mentioned article on odbms.org).
I have been stumbled on things like RDBMS alternatives very often now a days... And i am following some of the open source implementation..
What I understand is: it is best suited for the web apps in large scale (like google & amazon).. they mainly concentrated on very large distributed data stores..
how this could help small start ups looking for a existing costly alternative data stores.. and is this really yield both performance & maintanance gain for small applications?
I just started this discussion and belive somebody here already got same frustration trying these new approaches earlier and may gain experience in it.. this may help start ups like us..
It all depends on your scaling requirments. RBDMS require locks to work and so can only really be scaled "up". NoSQL-style DBs such as Googles bigtable and CouchDB are massively scalable and very cheap, but can get very complicated to write an app on top of as developers have to deal with all kinds of data consistency/fault tolerance issues in thier application layer.
I would say for a small application you're probably better off with a SQL-based relational database. Whilst in theory much more expensive, being realistic at a small scale that price trades off as a much simpler system to work with.
If however you're start up is a muti-tenant solution which needs to deal with a lot of writes, I'd look carefully at alternatives.
Ok where I work we have a fairly substantial number of systems written over the last couple of decades that we maintain.
The systems are diverse in that multiple operating systems (Linux, Solaris, Windows), Multiple Databases (Several Versions of oracle, sybase and mysql), and even multiple languages (C, C++, JSP, PHP, and a host of others) are used.
Each system is fairly autonomous, even at the cost of entering the same data into multiple systems.
Management recently decided that we should investigate what it will take to get all the systems happily talking to each other and sharing data.
Keep in mind that while we can make software changes to any of the individual systems, a complete rewrite of any one system (or more) is not something management is likely to entertain.
The first thought of several of the developers here was the straight forward: If system A needs data from system B it should just connect to system B's database and get it. Likewise if it needs to give B data it should just insert it into B's database.
Due to the mess of databases (and versions) used, other developers were of the opinion that we should have one new database, combining the tables from all the other systems to avoid having to juggle multiple connections. By doing this they hope that we might be able to consolidate some tables and get rid of the redundant data entry.
This is about the time I was brought in for my opinion on the whole mess.
The whole idea of using the database as a means of system communication smells funny to me. Business logic will have to be placed into multiple systems (if System A wants to add data to System B it better understand B's rules concerning the data before doing the insert), several systems will most likely have to do some form of database polling to find any changes to their data, continuing maintenance will be a headache, as any change to a database schema now propagates several systems.
My first thought was to take the time and write APIs/Services for the different systems, which once written could be easily used to pass/retrieve data back and forth. A lot of the other developers feel that is excessive and far more work than just using the database.
So what would be the best way to go about getting these systems to talk to each other?
Integrating disparate systems is my day job.
If I were you, I would go to great effort to avoid accessing System A's data from directly within System B. Updating System A's database from System B is extremely unwise. It is exactly the opposite of good practice to make your business logic so diffuse. You will end up regretting it.
The idea of the central database isn't necessarily bad ... but the amount of effort involved is probably within an order of magnitude of rewriting the systems from scratch. It is certainly not something I would attempt, at least in the form you describe. It can succeed, but it is much, much harder and it takes a lot more discipline than the point-to-point integration approach. It's funny to hear it suggested in the same breath as the 'cowboy' approach of just shoving data directly into other systems.
Overall your instincts seem pretty good. There are a couple of approaches. You mention one: implementing services. That's not a bad way to go, especially if you need updates in real time. The other is a separate integration application that is responsible for shuffling the data around. That's the approach I usually take, but usually because I can't change the systems I'm integrating to ask for the data it needs; I have to push the data in. In your case the services approach isn't a bad one.
One thing I would like to say that might not be obvious to someone coming to system integration for the first time is that every piece of data in your system should have a single, authoritative point of truth. If the data is duplicated (and it is duplicated), and the copies disagree with each other, the copy in the point of truth for that data must be taken to be correct. There is just no other way to integrate systems without having the complexity scream skyward at an exponential rate. Spaghetti integration is like spaghetti code, and it should be avoided at all costs.
Good luck.
EDIT:
Middleware addresses the problem of transport, but that is not the central problem in integration. If the systems are close enough together that one app can shove data directly in to another, they're probably close enough that a service offered by one can be called directly by another. I wouldn't recommend middleware in your case. You might get some benefit from it, but that would be outweighed by the increased complexity. You need to solve one problem at a time.
Sounds like you may want to investigate Message Queuing and message-oriented middleware.
MSMQ and Java Message Service being examples.
It seems you are looking for opinions, so I will provide mine.
I agree with the other developers that writing an API for all the different systems is excessive. You would likely get it done faster and have much more control over it if you just take the other suggestion of creating a single database.
One of the challenges that you will have is to align the data in each of the different systems so that it can be integrated in the first place. It may be that each of the systems that you want to integrate holds entirely different sets of data but more likely it is data that is overlapping. Before diving into writing API:s (which is the route I would take as well given your description) I would recommend that you try and come up with a logical data model for the data that needs to be integrated. This data model will then help you leverage the data that you are having in the different systems and make it more useful to the other databases.
I would also highly recommend an iterative approach to the integration. With legacy systems there is so much uncertainty that trying to design and implement it all in one go is too risky. Start small and work your way to a reasonably integrated system. "Fully integrated" is hardly ever worth aiming for.
Directly interfacing via pushing/ poking databases exposes a lot of internal detail of one system to another. There are obvious disadvantages: upgrading one system can break the other. Moreover, there can be technical limitations in how one system can access the database of the other (consider how an application written in C on Unix will interact with a SQL Server 2005 database running on Windows 2003 Server).
The first thing you have to decide is the platform where the "master database" will reside, and the same for the middleware providing the much required glue. Instead of going towards API level middleware-integration (such as CORBA), I would suggest you to consider Message Oriented Middleware. MS Biztalk, Sun's eGate and Oracle's Fusion can be some of the options.
Your idea of a new database is a step in the right direction. You might like to read a little bit on Enterprise Entity Aggregation pattern.
A combination of "data integration" with a middleware is the way to go.
If you are going towards Middleware + Single Central Database strategy, you might want to consider achieving this in multiple phases. Here's a logical stepped process which can be considered:
Implementation of services/APIs for different systems which expose the functionality for each system
Implementation of Middleware which accesses these APIs and provides an interface to all the systems to access the data/services from other systems (accesses data from central source if available, else gets it from another system)
Implementation of Central Database only, without data
Implementation of Caching/Data-Storage Services at the Middleware level which can store/cache data in the central database whenever that data is accessed from any of the Systems e.g. IF System A's records 1-5 are fetched by System B through Middleware, the Middleware Data Caching Services can store these records in the centralized database and the next time these records will be fetched from the central database
Data Cleansing can happen in Parallel
You can also create a import mechanism to push data from multiple systems to the central database on a daily basis (automated or manual)
This way, the effort is distributed across multiple milestones and data is gradually stored in the central database on first-accessed-first-stored basis.