What is a MongoDB driver in layman's terms? - mongodb

I've been scouring the MongoDB documentation, Google, Stackoverflow and YouTube... but I still can't seem to understand what a driver is used for in MongoDB.
I do know that different programming language can have one or many different drivers - but why do I need one?

You don't strictly speaking need one, but the alternative is building network packets manually scattered around in your code base... The term 'driver' is a bit irritating, because most people expect some kernel-level program that talks to hardware.
The MongoDB driver is more like an SDK or a helper library that helps you with a number of tasks that you'll almost certainly need to solve when you want to use MongoDB.
In essence, the MongoDB driver does these things:
it implements the MongoDB wire protocol that is used to talk to the database, i.e. it knows what 'messages' the database expects, it knows relevant constants, etc. 'It implements the MongoDB API' if you will.
It also comes with helpers to manage the actual TCP/IP sockets, creating them, resolving replica set addresses, implementing connection pooling, etc.
Next, the drivers contain helpers that make it easier to work with the BSON datatypes from your language, since there normally isn't a 1:1 mapping of types. A mongodb array, for example, could be mapped to an array or some kind of list or set container in most languages; ObjectId and ISODate might need a wrapper, and so on.
Lastly, the driver implements a serializer, that is, a piece of software that can create a copy of an instance 'from the outside', that is, without you having to implement a Serialize() method on each and every class (or whatever concept your language supports) you want to store. Together with 3), this writes the BSON representation of your data.
Serialization in itself isn't trivial, because one quickly has to cope with cyclical references, so a recursive algorithm on a set of unknown properties is required. If that doesn't sound complicated enough, the de-serialization (or hydration) of objects is even more painful, so it's not exactly the type of code that is super rewarding to write, unless it's highly reusable.
I'm sure I forgot something else the drivers do, but I think these are the key pain points they solve. As far as I know, their exact feature set varies from language to language and in some languages, the individual problems might be less or more pronounced, but they generally exist everywhere.

Related

Key value oriented database vs document oriented database

I have recently started learning NO SQL databases and I came across Key-Value oriented databases and Document oriented databases. Since they have a similar structure, aren't they saved and retrieved the exact same way? And if that is the case then why do we define them as separate types? Otherwise, how they are saved in the file system?
To get started it is better to pin point the least wrong vocabulary. What used to be called nosql is too broad in scope, and often there is no intersection feature-wise between two database that are dubbed nosql except for the fact that they somehow deal with "data". What program does not deal with data?! In the same spirit, I avoid the term Relational Database Management System (RDBMS). It is clear to most speakers and listeners that RDBMS is something among SQL Server, some kind of Oracle database, MySQL, PostgreSQL. It is fuzzy whether that includes SQLite, that is already an indicator, that "relational database" ain't the perfect word to describe the concept behind it. Even more so, what people usually call nosql never forbid relations. Even on top of "key-value" stores, one can build relations. In a Resource Description Framework database, the equivalent of SQL rows are called tuple, triple, quads and more generally and more simply: relations. Another example of relational database are database powered by datalog. So RDBMS and relational database is not a good word to describe the intended concepts, and when used by someone, only speak about the narrow view they have about the various paradigms that exists in the data(base) world.
In my opinion, it is better to speak of "SQL databases" that describe the databases that support a subset or superset of SQL programming language as defined by the ISO standard.
Then, the NoSQL wording makes sense: database that do not provide support for SQL programming language. In particular, that exclude Cassandra and Neo4J, that can be programmed with a language (respectivly CQL and Cypher / GQL) which surface syntax looks like SQL, but does not have the semantic of SQL (neither a superset, nor a subset of SQL). Remains Google BigQuery, which feels a lot like SQL, but I am not familiar enough with it to be able to draw a line.
Key-value store is also fuzzy. memcached, REDIS, foundationdb, wiredtiger, dbm, tokyo cabinet et. al are very different from each other and are used in verrrrrrrrrrry different use-cases.
Sorry, document-oriented database is not precise enough. Historically, they were two main databases, so called document database: ElasticSearch and MongoDB. And those yet-another-time, are very different software, and when used properly, do not solve the same problems.
You might have guessed it already, your question shows a lack of work, and as phrased, and even if I did not want to shave a yak regarding vocabulary related to databases, is too broad.
Since they have a similar structure,
No.
aren't they saved and retrieved the exact same way?
No.
And if that is the case then why do we define them as separate types?
Their programming interface, their deployment strategy and their internal structure, and intended use-cases are much different.
Otherwise, how they are saved in the file system?
That question alone is too broad, you need to ask a specific question at least explain your understanding of how one or more database work, and ask a question about where you want to go / what you want to understand. "How to go from point A-understanding (given), to point B-understanding (question)". In your question point A is absent, and point B is fuzzy or too-broad.
Moar:
First, make sure you have solid understanding of an SQL database, at the very least the SQL language (then dive into indices and at last fine-tuning). Without SQL knowledge, your are worthless on the job market. If you already have a good grasp of SQL, my recommendation is to forgo everything else but FoundationDB.
If you still want "benchmark" databases, first set a situation (real or imaginary) ie. a project that you know well, that requires a database. Try to fit several databases to solve the problems of that project.
Lastly, if you have a precise project in mind, try to answer the following questions, prior to asking another question on database-design:
What guarantees do you need. Question all the properties of ACID: Atomic, Consistent, Isolation, Durability. Look into BASE. You do not necessarily need ACID or BASE, but it is a good basis that is well documented to know where you want / need to go.
What is size of the data?
What is the shape of the data? Are they well defined types? Are they polymorphic types (heterogeneous shapes)?
Workload: Write-once then Read-only, mostly reads, mostly writes, a mix of both. Answer also the question how fast or slow can be writes or reads.
Querying: How queries look like: recursive / deep, columns or rows, or neighboor hood queries (like graphql and SQL without recursive queries do). Again what is the expected time to response.
Do not forgo to at least the review deployement and scaling strategies prior to commit to a particular solution.
On my side, I picked up foundationdb because it is the most versatile in those regards, even if at the moment it requires some code to be a drop-in replacement for all postgresql features.

share backbone code on client and server side with mongodb storage

I'm looking for a solution to code only once the models for a backbone, mongodb, nodejs based app.
The storage can be only server side, but I still need proper model definitions both on the server and the client. On the server side I've decided to go with mongodb.
After all the only thing I've found is https://github.com/donedotcom/backbone-mongodb.
I think I've understood backbone quite well, but have never use mongodb before, and I can't figure out how to really use backbone-mongodb. Could someone tell me how it complements backbone, what Document and EmbeddedDocument are meant for and how they related to Backbone.Model? Does this have anything to do with code sharing b/w client and server?
Of course, my idea would be to share the model definitions and validation (done mostly with backbone-validation) b/w the server and the client.
thanks, Viktor
I've just finished rewriting backbone-mongodb
there is an example todo application (stay with commit eb935ae7480c18c9d6fcf2f5a2187cdff3d17a13) available as well
TL;DR
Document <-> Backbone.Model
Read and write data on Node.js by overriding Backbone.sync.
EmbeddedDocument no exact match: probably possible to implement via Backbone-relational, some assembly required.
Long read
Since MongoDB is a document-centric database Backbone.Model's will fit Mongo's Document's quite nicely. You can think about MongoDB's Documents as if you could store searchable JSON blobs (..oversimplification for the sake of getting started, but still). They will more will more or less be an exact match to Backbones Models. EmbeddedDocument's corresponds somewhat (..oversimplification again, same reason) to related tables in traditional relational systems. They don't have an exact match in the Backbone world, but you could possible use Backbone-relational to handle them in your Node application. I haven't tried it but I'm making a qualified guess that it will need certain amount of hand-holding.
On the Node side, you'll want to override Backbone.sync, probably globally to read and write Modelobjects to MongoDB Documents.
Also, embedded documents are just that - they are the actual data stored inside another object, not a link to that data stored independently (docs). It's also possible to do links, which are more like traditional relations (see same link).
To be able to correctly program something with this combination, I think you should read at least a bit more on MongoDB, here's some pointers:
Getting started with MongoDB and Python, Python-centric but still a very good introduction to MongoDB.
Have you checked out this MongoDB port of the typical Backbone Todo?
Here's another example of someone describing a webapp using Node & MongoDB. It's not Backbone-driven but it'll still show you a lot about how to work with MongoDB from Node.js.

recommendations for a dbms for an EAV system with mostly insert and select operations needs on .net stack

In the project I have been working on, the data modeling requirements are:
A system consisting of N number of clients with each having N number of events. An event is an entity with a required name and timestamp at which it occurs. Optionally, an event may have N number of properties (key/value pares) defining attributes that a client want to store with the particular instance of that event.
The system will have mostly:
inserts – events are logged but never updated.
selects – reports/actions will be generated/executed based on events and properties of any possible combinations.
The requirements reflect an entity-attribute-value (EAV) data model. After researching for sometimes, I feel that a relational dbms like Sql Server might not be a good fit for this. (correct me if I'm wrong!)
So I'm leaning toward NoSql option like MongoDb/CouchDb/RavenDb etc.
My questions are:
What is the best fit in available NoSql solutions keeping in view of my system's heavy insert/select needs?
I'm also open for relational option if these requirements can be translated into relational schema. Although I personally doubt this, but after reading performance DBA answers (like referenced here), I got curious. However, I couldn't figure out myself an optimal relational model for my requirements, perhaps the system being rather generic.
thanks!
MongoDB really shines when you write unstructured data to it (like your event). Also, it is able to sustain pretty heavy write load. However, it's not very good for reporting. At least, for reporting in the traditional sense.
So, if your reporting needs are simple, you might get away with some simple map-reduce jobs. Otherwise you can export data to a relational database (nightly job, for example) and report the hell out of it.
Such hybrid solution is pretty common (in my experience).

Where can I get the ANSI or ISO standards for the RDBMS queries?

I want to write some queries which can work in almost all the databases without any SQLExceptions. So, where can I get the ANSI standards to write the queries ?
Not sure that'll help you.
Vendors are touch and go as far as standards implementation and often the standards themselves are imprecise enough such that you could never write a query that would work with all implementors.
For example, SQL 92 defines the concatenation operator as || but neither MySQL nor MSSQL use this (Oracle does). Vendor independent string concatenation is impossible.
Similarly, a standard escape character is not specified so how you handled that might not work in all vendors.
Having said that:
SQL 92:
http://www.contrib.andrew.cmu.edu/~shadow/sql/sql1992.txt
Wiki article with links to SQL 99 ISO documents:
http://en.wikipedia.org/wiki/SQL:1999
From wikipedia:
The SQL standard is not freely available. The whole standard may be purchased from the ISO as ISO/IEC 9075(1-4,9-11,13,14):2008.
Nevertheless I would not advise you to follow this strategy because no database engine follows any SQL standard (SQL 99, 2003, etc.) to the letter. All of them take liberties in the way they handle instructions or define variables (for example, when comparing two strings different engines handle case sensitivity differently). A method that is very efficient with one engine can be terrible inefficient for another.
A suggestion would be to develop a standard group of queries and develop different classes that contain the specific implementation of that query for a certain target RDBMS.
Hope this helped
Check out the BNF of the core SQL grammars available at http://savage.net.au/SQL/
This is part of the answer - the rest, as pointed out by Kiranu and MattMitchell, is that different vendors implement the standard differently. No DBMS adheres perfectly to even SQL-92, though most are pretty close.
One observation: the SQL standard says nothing about indexes - so there is no standard syntax for creating an index. It also says nothing about how to create a database; each vendor has their own mechanisms for doing that.
The Sql-92 standard is probably the one you want to target. I believe it's supported most of the major RDBMSs.
Here is a less terse link. Sample content:
PostgreSQL Has views. Breaks standard by not allowing updates to views...
DB2 Conforms to at least SQL-92.
MSSQL Conforms to at least SQL-92.
MySQL Conforms to at least SQL-92.
Oracle Conforms to at least SQL-92.
Informix Conforms to at least SQL-92.
Something else you might consider, if you're using .NET, is to use the factory pattern in System.Data.Common which does a good job of abstracting provider specifics for a number of RDBMSs.
If you are trying to make a product that will work against multiple databases I think trying to only use standard sql is not the way to go, as other answers have indicated, due to the different 'interpretations' of the standard. Instead you should if possible have some kind of data access layer in your application which has different implementations specific for each database. Depending on what you are trying to do, there are tools such as Hibernate which will so a lot of the heavy lifting in regards to this for you.

why does memcached not support "multi set"

Can anyone explain why memcached folks decided to support multi get but not multi set.
By multi I mean operation involving more than one key (see protocol at http://code.google.com/p/memcached/wiki/NewCommands).
So you can get multiple keys in one shot (basic advantage is the standard saving you get by doing less round trips) but why can not you get bulk sets?
My theory is that it was meant to do less number of sets and that too individually (e.g. on a cache read and miss). But I still do not see how multi-set really conflicts with the general philosophy of memcached.
I looked at the client features at http://code.google.com/p/memcached/wiki/NewCommonFeatures and it seems that some clients potentially do support "Multi-Set" (why only in binary protocol?). I am using Java spy memcached, btw.
It's not supported in the text protocol because it'd be very, very complicated to express, no clients would support it, and it would provide very little that you can't already do from the text protocol.
It's supported in the binary protocol because it's a trivial use case of binary operations.
spymemcached supports it implicitly -- just do a bunch of sets and magic happens:
http://dustin.github.com/2009/09/23/spymemcached-optimizations.html
I don't know a lot about memcache internals, but I assume writes have to be blocking, atomic operations. I assume that allowing multiple set operations to be batched, you could block all reads for a long time (or risk a get occurring while only half of a write had been applied). Forcing writes to be done individually allows them to be interleaved fairly with gets.
I would imagine that the restriction against using multi sets is to avoid collisions when writing cached values to the memcache.
As an object cache, I can't foresee an example of when you would need transactional type writes. This use case seems less suited for a caching layer, but better suited for the underlying database.
If sets come in interleaved from different clients, it is most likely the case that for one key, the last one wins, or is at least close enough, until the cache is invalidated and a newer value is written.
As Gian mentions, there don't seem to be any good reasons to block reads from the cache while several or many writes to the cache happen.