Complex queries vs storing more data - mongodb

I'm having an hard time deciding what's the better approach when figuring how what I want to have in my application. I mostly use MongoDB, and work on web applications, if it can make the answer more specific.
I wonder what will be a better approach:
Store as little data as possible in Mongo's collections, and implement my different features mainly with server side logic and calculations against the stored data.
Store anything I need for a feature, to avoid complex queries and logic in server side functions, while filling my Mongo with lots of information.
Because of lack of experince (I'm only beginning my way as web developer), I believe, I can't figure out which approach should I take. I thought about taking the second one while prototyping, but switching while starting to scale might be too much effort. Any suggestions?

I think you're missing a 3rd option:
Store little data in the collection, do calculations on the client.
The beauty of meteor comes with the tech shift to thick clients. Back in the day, primarily prior to V8, browsers couldn't do much. Now, they can do a crazy amount of work. By pushing the work to your clients, you take processing workload off your server, you don't have to save calculated fields to your DB, and often times client processing time < server processing time + transmission time.
If you can't do option #3, lean towards #2. Storage is generally cheaper (and definitely faster) than processing power, but it depends on the complexity of the query. If storage cost < processing cost, store it.
array.toString()? calculate it. Neural network simulation? store it.

Related

will I typically get better performance if I run an update/calc loop via javascript?

I have a script that loops over a set of records, performs some statistical calculations and updates the records. It's a big cursor: get record, calculate statistics from embedded documents, set fields on record, save record. There's <5k records that are being looped and each one embeds 90 history entries.
Question: would I get substantially better performance if I did this via javascript? The alternative being writing it in Ruby. My opinion (unfounded) is that since this can be done entirely in the database I will get better performance if send a chunk of js to Mongodb instead of adding Ruby in to the mix.
Related: is map/reduce appropriate for finding the median and mode of a set of values for many records?
The answer is really "it depends" - if the fields you need to do the calculations are very large, doing the calculation on the server side with JS might be a lot faster simply by cutting down on network traffic.
But, executing JS on the server side also holds a write lock, so depending on how complicated the calculations are, it might be more efficient to just do your calculations on the client side and then simply update the document.
Your best bet is to do a simple benchmark with ruby vs. server side JS. If you need to serve other database traffic at the same time, this should also be considered as well, because your lock % could be different in the two scenarios (you can monitor this with mongostat).
Also, keep in mind that using db.eval will not work with sharding, so avoid it if you are using a sharded environment or plan to in the future.

NoSQL & AdHoc Queries - Millions of Rows

I currently run a MySQL-powered website where users promote advertisements and gain revenue every time someone completes one. We log every time someone views an ad ("impression"), every time a user clicks an add ("click"), and every time someone completes an ad ("lead").
Since we get so much traffic, we have millions of records in each of these respective tables. We then have to query these tables to let users see how much they have earned, so we end up performing multiple queries on tables with millions and millions of rows multiple times in one request, hundreds of times concurrently.
We're looking to move away from MySQL and to a key-value store or something along those lines. We need something that will let us store all these millions of rows, query them in milliseconds, and MOST IMPORTANTLY, use adhoc queries where we can query any single column, so we could do things like:
FROM leads WHERE country = 'US' AND user_id = 501 (the NoSQL equivalent, obviously)
FROM clicks WHERE ad_id = 1952 AND user_id = 200 AND country = 'GB'
etc.
Does anyone have any good suggestions? I was considering MongoDB or CouchDB but I'm not sure if they can handle querying millions of records multiple times a second and the type of adhoc queries we need.
Thanks!
With those requirements, you are probably better off sticking with SQL and setting up replication/clustering if you are running into load issues. You can set up indexing on a document database so that those queries are possible, but you don't really gain anything over your current system.
NoSQL systems generally improve performance by leaving out some of the more complex features of relational systems. This means that they will only help if your scenario doesn't require those features. Running ad hoc queries on tabular data is exactly what SQL was designed for.
CouchDB's map/reduce is incremental which means it only processes a document once and stores the results.
Let's assume, for a moment, that CouchDB is the slowest database in the world. Your first query with millions of rows takes, maybe, 20 hours. That sounds terrible. However, your second query, your third query, your fourth query, and your hundredth query will take 50 milliseconds, perhaps 100 including HTTP and network latency.
You could say CouchDB fails the benchmarks but gets honors in the school of hard knocks.
I would not worry about performance, but rather if CouchDB can satisfy your ad-hoc query requirements. CouchDB wants to know what queries will occur, so it can do the hard work up-front before the query arrives. When the query does arrive, the answer is already prepared and out it goes!
All of your examples are possible with CouchDB. A so-called merge-join (lots of equality conditions) is no problem. However CouchDB cannot support multiple inequality queries simultaneously. You cannot ask CouchDB, in a single query, for users between age 18-40 who also clicked fewer than 10 times.
The nice thing about CouchDB's HTTP and Javascript interface is, it's easy to do a quick feasibility study. I suggest you try it out!
Most people would probably recommend MongoDB for a tracking/analytic system like this, for good reasons. You should read the „MongoDB for Real-Time Analytics” chapter from the „MongoDB Definitive Guide” book. Depending on the size of your data and scaling needs, you could get all the performance, schema-free storage and ad-hoc querying features. You will need to decide for yourself if issues with durability and unpredictability of the system are risky for you or not.
For a simpler tracking system, Redis would be a very good choice, offering rich functionality, blazing speed and real durability. To get a feel how such a system would be implemented in Redis, see this gist. The downside is, that you'd need to define all the „indices” by yourself, not gain them for „free”, as is the case with MongoDB. Nevertheless, there's no free lunch, and MongoDB indices are definitely not a free lunch.
I think you should have a look into how ElasticSearch would enable you:
Blazing speed
Schema-free storage
Sharding and distributed architecture
Powerful analytic primitives in the form of facets
Easy implementation of „sliding window”-type of data storage with index aliases
It is in heart a „fulltext search engine”, but don't get yourself confused by that. Read the „Data Visualization with ElasticSearch and Protovis“ article for real world use case of ElasticSearch as a data mining engine.
Have a look on these slides for real world use case for „sliding window” scenario.
There are many client libraries for ElasticSearch available, such as Tire for Ruby, so it's easy to get off the ground with a prototype quickly.
For the record (with all due respect to #jhs :), based on my experience, I cannot imagine an implementation where Couchdb is a feasible and useful option. It would be an awesome backup storage for your data, though.
If your working set can fit in the memory, and you index the right fields in the document, you'd be all set. Your ask is not something very typical and I am sure with proper hardware, right collection design (denormalize!) and indexing you should be good to go. Read up on Mongo querying, and use explain() to test the queries. Stay away from IN and NOT IN clauses that'd be my suggestion.
It really depends on your data sets. The number one rule to NoSQL design is to define your query scenarios first. Once you really understand how you want to query the data then you can look into the various NoSQL solutions out there. The default unit of distribution is key. Therefore you need to remember that you need to be able to split your data between your node machines effectively otherwise you will end up with a horizontally scalable system with all the work still being done on one node (albeit better queries depending on the case).
You also need to think back to CAP theorem, most NoSQL databases are eventually consistent (CP or AP) while traditional Relational DBMS are CA. This will impact the way you handle data and creation of certain things, for example key generation can be come trickery.
Also remember than in some systems such as HBase there is no indexing concept. All your indexes will need to be built by your application logic and any updates and deletes will need to be managed as such. With Mongo you can actually create indexes on fields and query them relatively quickly, there is also the possibility to integrate Solr with Mongo. You don’t just need to query by ID in Mongo like you do in HBase which is a column family (aka Google BigTable style database) where you essentially have nested key-value pairs.
So once again it comes to your data, what you want to store, how you plan to store it, and most importantly how you want to access it. The Lily project looks very promising. The work I am involved with we take a large amount of data from the web and we store it, analyse it, strip it down, parse it, analyse it, stream it, update it etc etc. We dont just use one system but many which are best suited to the job at hand. For this process we use different systems at different stages as it gives us fast access where we need it, provides the ability to stream and analyse data in real-time and importantly, keep track of everything as we go (as data loss in a prod system is a big deal) . I am using Hadoop, HBase, Hive, MongoDB, Solr, MySQL and even good old text files. Remember that to productionize a system using these technogies is a bit harder than installing MySQL on a server, some releases are not as stable and you really need to do your testing first. At the end of the day it really depends on the level of business resistance and the mission-critical nature of your system.
Another path that no one thus far has mentioned is NewSQL - i.e. Horizontally scalable RDBMSs... There are a few out there like MySQL cluster (i think) and VoltDB which may suit your cause.
Again it comes to understanding your data and the access patterns, NoSQL systems are also Non-Rel i.e. non-relational and are there for better suit to non-relational data sets. If your data is inherently relational and you need some SQL query features that really need to do things like Cartesian products (aka joins) then you may well be better of sticking with Oracle and investing some time in indexing, sharding and performance tuning.
My advice would be to actually play around with a few different systems. However for your use case I think a Column Family database may be the best solution, I think there are a few places which have implemented similar solutions to very similar problems (I think the NYTimes is using HBase to monitor user page clicks). Another great example is Facebook and like, they are using HBase for this. There is a really good article here which may help you along your way and further explain some points above. http://highscalability.com/blog/2011/3/22/facebooks-new-realtime-analytics-system-hbase-to-process-20.html
Final point would be that NoSQL systems are not the be all and end all. Putting your data into a NoSQL database does not mean its going to perform any better than MySQL, Oracle or even text files... For example see this blog post: http://mysqldba.blogspot.com/2010/03/cassandra-is-my-nosql-solution-but.html
I'd have a look at;
MongoDB - Document - CP
CouchDB - Document - AP
Redis - In memory key-value (not column family) - CP
Cassandra - Column Family - Available & Partition Tolerant (AP)
HBase - Column Family - Consistent & Partition Tolerant (CP)
Hadoop/Hive - Also have a look at Hadoop streaming...
Hypertable - Another CF CP DB.
VoltDB - A really good looking product, a relation database that is distributed and might work for your case (may be an easier move). They also seem to provide enterprise support which may be more suited for a prod env (i.e. give business users a sense of security).
Any way thats my 2c. Playing around with the systems is really the only way your going to find out what really works for your case.

Example of a task that a NoSQL database can't handle (if any)

I would like to test the NoSQL world. This is just curiosity, not an absolute need (yet).
I have read a few things about the differences between SQL and NoSQL databases. I'm convinced about the potential advantages, but I'm a little worried about cases where NoSQL is not applicable. If I understand NoSQL databases essentially miss ACID properties.
Can someone give an example of some real world operation (for example an e-commerce site, or a scientific application, or...) that an ACID relational database can handle but where a NoSQL database could fail miserably, either systematically with some kind of race condition or because of a power outage, etc ?
The perfect example will be something where there can't be any workaround without modifying the database engine. Examples where a NoSQL database just performs poorly will eventually be another question, but here I would like to see when theoretically we just can't use such technology.
Maybe finding such an example is database specific. If this is the case, let's take MongoDB to represent the NoSQL world.
Edit:
to clarify this question I don't want a debate about which kind of database is better for certain cases. I want to know if this technology can be an absolute dead-end in some cases because no matter how hard we try some kind of features that a SQL database provide cannot be implemented on top of nosql stores.
Since there are many nosql stores available I can accept to pick an existing nosql store as a support but what interest me most is the minimum subset of features a store should provide to be able to implement higher level features (like can transactions be implemented with a store that don't provide X...).
This question is a bit like asking what kind of program cannot be written in an imperative/functional language. Any Turing-complete language and express every program that can be solved by a Turing Maching. The question is do you as a programmer really want to write a accounting system for a fortune 500 company in non-portable machine instructions.
In the end, NoSQL can do anything SQL based engines can, the difference is you as a programmer may be responsible for logic in something Like Redis that MySQL gives you for free. SQL databases take a very conservative view of data integrity. The NoSQL movement relaxes those standards to gain better scalability, and to make tasks that are common to Web Applications easier.
MongoDB (my current preference) makes replication and sharding (horizontal scaling) easy, inserts very fast and drops the requirement for a strict scheme. In exchange users of MongoDB must code around slower queries when an index is not present, implement transactional logic in the app (perhaps with three phase commits), and we take a hit on storage efficiency.
CouchDB has similar trade-offs but also sacrifices ad-hoc queries for the ability to work with data off-line then sync with a server.
Redis and other key value stores require the programmer to write much of the index and join logic that is built in to SQL databases. In exchange an application can leverage domain knowledge about its data to make indexes and joins more efficient then the general solution the SQL would require. Redis also require all data to fit in RAM but in exchange gives performance on par with Memcache.
In the end you really can do everything MySQL or Postgres do with nothing more then the OS file system commands (after all that is how the people that wrote these database engines did it). It all comes down to what you want the data store to do for you and what you are willing to give up in return.
Good question. First a clarification. While the field of relational stores is held together by a rather solid foundation of principles, with each vendor choosing to add value in features or pricing, the non-relational (nosql) field is far more heterogeneous.
There are document stores (MongoDB, CouchDB) which are great for content management and similar situations where you have a flat set of variable attributes that you want to build around a topic. Take site-customization. Using a document store to manage custom attributes that define the way a user wants to see his/her page is well suited to the platform. Despite their marketing hype, these stores don't tend to scale into terabytes that well. It can be done, but it's not ideal. MongoDB has a lot of features found in relational databases, such as dynamic indexes (up to 40 per collection/table). CouchDB is built to be absolutely recoverable in the event of failure.
There are key/value stores (Cassandra, HBase...) that are great for highly-distributed storage. Cassandra for low-latency, HBase for higher-latency. The trick with these is that you have to define your query needs before you start putting data in. They're not efficient for dynamic queries against any attribute. For instance, if you are building a customer event logging service, you'd want to set your key on the customer's unique attribute. From there, you could push various log structures into your store and retrieve all logs by customer key on demand. It would be far more expensive, however, to try to go through the logs looking for log events where the type was "failure" unless you decided to make that your secondary key. One other thing: The last time I looked at Cassandra, you couldn't run regexp inside the M/R query. Means that, if you wanted to look for patterns in a field, you'd have to pull all instances of that field and then run it through a regexp to find the tuples you wanted.
Graph databases are very different from the two above. Relations between items(objects, tuples, elements) are fluid. They don't scale into terabytes, but that's not what they are designed for. They are great for asking questions like "hey, how many of my users lik the color green? Of those, how many live in California?" With a relational database, you would have a static structure. With a graph database (I'm oversimplifying, of course), you have attributes and objects. You connect them as makes sense, without schema enforcement.
I wouldn't put anything critical into a non-relational store. Commerce, for instance, where you want guarantees that a transaction is complete before delivering the product. You want guaranteed integrity (or at least the best chance of guaranteed integrity). If a user loses his/her site-customization settings, no big deal. If you lose a commerce transation, big deal. There may be some who disagree.
I also wouldn't put complex structures into any of the above non-relational stores. They don't do joins well at-scale. And, that's okay because it's not the way they're supposed to work. Where you might put an identity for address_type into a customer_address table in a relational system, you would want to embed the address_type information in a customer tuple stored in a document or key/value. Data efficiency is not the domain of the document or key/value store. The point is distribution and pure speed. The sacrifice is footprint.
There are other subtypes of the family of stores labeled as "nosql" that I haven't covered here. There are a ton (122 at last count) different projects focused on non-relational solutions to data problems of various types. Riak is yet another one that I keep hearing about and can't wait to try out.
And here's the trick. The big-dollar relational vendors have been watching and chances are, they're all building or planning to build their own non-relational solutions to tie in with their products. Over the next couple years, if not sooner, we'll see the movement mature, large companies buy up the best of breed and relational vendors start offering integrated solutions, for those that haven't already.
It's an extremely exciting time to work in the field of data management. You should try a few of these out. You can download Couch or Mongo and have them up and running in minutes. HBase is a bit harder.
In any case, I hope I've informed without confusing, that I have enlightened without significant bias or error.
RDBMSes are good at joins, NoSQL engines usually aren't.
NoSQL engines is good at distributed scalability, RDBMSes usually aren't.
RDBMSes are good at data validation coinstraints, NoSQL engines usually aren't.
NoSQL engines are good at flexible and schema-less approaches, RDBMSes usually aren't.
Both approaches can solve either set of problems; the difference is in efficiency.
Probably answer to your question is that mongodb can handle any task (and sql too). But in some cases better to choose mongodb, in others sql database. About advantages and disadvantages you can read here.
Also as #Dmitry said mongodb open door for easy horizontal and vertical scaling with replication & sharding.
RDBMS enforce strong consistency while most no-sql are eventual consistent. So at a given point in time when data is read from a no-sql DB it might not represent the most up-to-date copy of that data.
A common example is a bank transaction, when a user withdraw money, node A is updated with this event, if at the same time node B is queried for this user's balance, it can return an outdated balance. This can't happen in RDBMS as the consistency attribute guarantees that data is updated before it can be read.
RDBMs are really good for quickly aggregating sums, averages, etc. from tables. e.g. SELECT SUM(x) FROM y WHERE z. It's something that is surprisingly hard to do in most NoSQL databases, if you want an answer at once. Some NoSQL stores provide map/reduce as a way of solving the same thing, but it is not real time in the same way it is in the SQL world.

MongoDB vs. Redis vs. Cassandra for a fast-write, temporary row storage solution

I'm building a system that tracks and verifies ad impressions and clicks. This means that there are a lot of insert commands (about 90/second average, peaking at 250) and some read operations, but the focus is on performance and making it blazing-fast.
The system is currently on MongoDB, but I've been introduced to Cassandra and Redis since then. Would it be a good idea to go to one of these two solutions, rather than stay on MongoDB? Why or why not?
Thank you
For a harvesting solution like this, I would recommend a multi-stage approach. Redis is good at real time communication. Redis is designed as an in-memory key/value store and inherits some very nice benefits of being a memory database: O(1) list operations. For as long as there is RAM to use on a server, Redis will not slow down pushing to the end of your lists which is good when you need to insert items at such an extreme rate. Unfortunately, Redis can't operate with data sets larger than the amount of RAM you have (it only writes to disk, reading is for restarting the server or in case of a system crash) and scaling has to be done by you and your application. (A common way is to spread keys across numerous servers, which is implemented by some Redis drivers especially those for Ruby on Rails.) Redis also has support for simple publish/subscribe messenging, which can be useful at times as well.
In this scenario, Redis is "stage one." For each specific type of event you create a list in Redis with a unique name; for example we have "page viewed" and "link clicked." For simplicity we want to make sure the data in each list is the same structure; link clicked may have a user token, link name and URL, while the page viewed may only have the user token and URL. Your first concern is just getting the fact it happened and whatever absolutely neccesary data you need is pushed.
Next we have some simple processing workers that take this frantically inserted information off of Redis' hands, by asking it to take an item off the end of the list and hand it over. The worker can make any adjustments/deduplication/ID lookups needed to properly file the data and hand it off to a more permanent storage site. Fire up as many of these workers as you need to keep Redis' memory load bearable. You could write the workers in anything you wish (Node.js, C#, Java, ...) as long as it has a Redis driver (most web languages do now) and one for your desired storage (SQL, Mongo, etc.)
MongoDB is good at document storage. Unlike Redis it is able to deal with databases larger than RAM and it supports sharding/replication on it's own. An advantage of MongoDB over SQL-based options is that you don't have to have a predetermined schema, you're free to change the way data is stored however you want at any time.
I would, however, suggest Redis or Mongo for the "step one" phase of holding data for processing and use a traditional SQL setup (Postgres or MSSQL, perhaps) to store post-processed data. Tracking client behavior sounds like relational data to me, since you may want to go "Show me everyone who views this page" or "How many pages did this person view on this given day" or "What day had the most viewers in total?". There may be even more complex joins or queries for analytic purposes you come up with, and mature SQL solutions can do a lot of this filtering for you; NoSQL (Mongo or Redis specifically) can't do joins or complex queries across varied sets of data.
I currently work for a very large ad network and we write to flat files :)
I'm personally a Mongo fan, but frankly, Redis and Cassandra are unlikely to perform either better or worse. I mean, all you're doing is throwing stuff into memory and then flushing to disk in the background (both Mongo and Redis do this).
If you're looking for blazing fast speed, the other option is to keep several impressions in local memory and then flush them disk every minute or so. Of course, this is basically what Mongo and Redis do for you. Not a real compelling reason to move.
All three solutions (four if you count flat-files) will give you blazing fast writes. The non-relational (nosql) solutions will give you tunable fault-tolerance as well for the purposes of disaster recovery.
In terms of scale, our test environment, with only three MongoDB nodes, can handle 2-3k mixed transactions per second. At 8 nodes, we can handle 12k-15k mixed transactions per second. Cassandra can scale even higher. 250 reads is (or should be) no problem.
The more important question is, what do you want to do with this data? Operational reporting? Time-series analysis? Ad-hoc pattern analysis? real-time reporting?
MongoDB is a good option if you want the ability to do ad-hoc analysis based on multiple attributes within a collection. You can put up to 40 indexes on a collection, though the indexes will be stored in-memory, so watch for size. But the result is a flexible analytical solution.
Cassandra is a key-value store. You define a static column or set of columns that will act as your primary index right up front. All queries run against Cassandra should be tuned to this index. You can put a secondary on it, but that's about as far as it goes. You can, of course, use MapReduce to scan the store for non-key attribution, but it will be just that: a serial scan through the store. Cassandra also doesn't have the notion of "like" or regex operations on the server nodes. If you want to find all customers where the first name starts with "Alex", you'll have to scan through the entire collection, pull the first name out for each entry and run it through a client-side regex.
I'm not familiar enough with Redis to speak intelligently about it. Sorry.
If you are evaluating non-relational platforms, you might also want to consider CouchDB and Riak.
Hope this helps.
Just found this: http://blog.axant.it/archives/236
Quoting the most interesting part:
This second graph is about Redis RPUSH vs Mongo $PUSH vs Mongo insert, and I find this graph to be really interesting. Up to 5000 entries mongodb $push is faster even when compared to Redis RPUSH, then it becames incredibly slow, probably the mongodb array type has linear insertion time and so it becomes slower and slower. mongodb might gain a bit of performances by exposing a constant time insertion list type, but even with the linear time array type (which can guarantee constant time look-up) it has its applications for small sets of data.
I guess everything depends at least on data type and volume. Best advice probably would be to benchmark on your typical dataset and see yourself.
According to the Benchmarking Top NoSQL Databases (download here)
I recommend Cassandra.
If you have the choice (and need to move away from flat fies) I would go with Redis. Its blazingly fast, will comfortably handle the load you're talking about, but more importantly you won't have to manage the flushing/IO code. I understand its pretty straight forward but less code to manage is better than more.
You will also get horizontal scaling options with Redis that you may not get with file based caching.
I can get around 30k inserts/sec with MongoDB on a simple $350 Dell. If you only need around 2k inserts/sec, I would stick with MongoDB and shard it for scalability. Maybe also look into doing something with Node.js or something similar to make things more asynchronous.
The problem with inserts into databases is that they usually require writing to a random block on disk for each insert. What you want is something that only writes to disk every 10 inserts or so, ideally to sequential blocks.
Flat files are good. Summary statistics (eg total hits per page) can be obtained from flat files in a scalable manner using merge-sorty map-reducy type algorithms. It's not too hard to roll your own.
SQLite now supports Write Ahead Logging, which may also provide adequate performance.
I have hand-on experience with mongodb, couchdb and cassandra. I converted a lot of files to base64 string and insert these string into nosql.
mongodb is the fastest. cassandra is slowest. couchdb is slow too.
I think mysql would be much faster than all of them, but I didn't try mysql for my test case yet.

Reasons for & against a Database

i had a discussion with a coworker about the architecture of a program i'm writing and i'd like some more opinions.
The Situation:
The Program should update at near-realtime (+/- 1 Minute).
It involves the movement of objects on a coordinate system.
There are some events that occur at regular intervals (i.e. creation of the objects).
Movements can change at any time through user input.
My solution was:
Build a server that runs continously and stores the data internally.
The server dumps a state-of-the-program at regular intervals to protect against powerfailures and/or crashes.
He argued that the program requires a Database and i should use cronjobs to update the data. I can store movement information by storing startpoint, endpoint and speed and update the position in the cronjob (and calculate collisions with other objects there) by calculating direction and speed.
His reasons:
Requires more CPU & Memory because it runs constantly.
Powerfailures/Crashes might destroy data.
Databases are faster.
My reasons against this are mostly:
Not very precise as events can only occur at full minutes (wouldn't be that bad though).
Requires (possibly costly) transformation of data on every run from relational data to objects.
RDBMS are a general solution for a specialized problem so a specialized solution should be more efficient.
Powerfailures (or other crashes) can leave the Data in an undefined state with only partially updated data unless (possibly costly) precautions (like transactions) are taken.
What are your opinions about that?
Which arguments can you add for any side?
Databases are not faster. How silly... How can a database be faster than writing a custom data structure and storing it in memory ?? Databases are Generalized tools to persist data to disk for you so you don't have to write all the code to do that yourself. Because they have to address the needs of numerous disparate (and sometimes inconsistent) business functions (Persistency (Durability), Transactional integrity, caching, relational integrity, atomicity, etc. etc. ) and do it in a way that protects the application developer from having to worry about it so much, by definition it is going to be slower. That doesn't necessarilly mean his conclusion is wrong however.
Each of his other objections can be addressed by writing the code to address that issue yourself... But you see where that is going... At some point, the development efforts of writing the custom code to address the issues that are important for your application outweigh the performance hit of just using a database - which already does all that stuff out of the box... How many of these issues are important ? and do you know how to write the code necessary to address them ?
From what you've described here, I'd say your solution does seem to be the better option. You say it runs once a minute, but how long does it take to run? If only a few seconds, then the transformation to relational data would likely be inconsequential, as would any other overhead. most of this would take likely 30 seconds. This is assuming, again, that the program is quite small.
However, if it is larger, and assuming that it will get larger, doing a straight dump is a better method. You might not want to do a full dump every run, but that's up to you, just remember that it could wind up taking a lot of space (same goes if you're using a database).
If you're going to dump the state, you would need to have some sort of a redundancy system in place, along with quasi-transactions. You would want to store several copies, in case something happens to the newest version. Say, the power goes out while you're storing, and you have no backups beyond this half-written one. Transactions, you would need something to tell that the file has been fully written, so if something does go wrong, you can always tell what the most recent successful save was.
Oh, and for his argument of it running constantly: if you have it set to a cronjob, or even a self-enclosed sleep statement or similar, it doesn't use any CPU time when it's not running, the same amount that it would if you're using an RDBMS.
If you're writing straight to disk, then this will be the faster method over a database, and faster retrieval, since, as you pointed out, there is no overhead.
Summary: A database is a good idea if you have a lot of idle processor time or historical records, but if resources are a legitimate concern, then it can become too much overhead and a dump with precautions taken is better.
mySQL can now model spatial data.
http://dev.mysql.com/doc/refman/4.1/en/gis-introduction.html
http://dev.mysql.com/doc/refman/5.1/en/spatial-extensions.html
You could use the database to keep track of world locations, user locations, items locations ect.