How can I add a new data structure in PostgreSQL? - postgresql

I created my own R-Tree and I would like to add it to PostgreSQL, I was reading about PostGis, However I don't kwow exactly How can I do that.

R-Tree is implemented in PostgreSQL as GiST index for two-dimensional geometric data types. To add your own implementation you should consider using GiST infrastructure too. Citing the documentation:
Traditionally, implementing a new index access method meant a lot of difficult work. It was necessary to understand the inner workings of the database, such as the lock manager and Write-Ahead Log. The GiST interface has a high level of abstraction, requiring the access method implementer only to implement the semantics of the data type being accessed. The GiST layer itself takes care of concurrency, logging and searching the tree structure.
So, first read this chapter to make sure you understand the concepts of index methods, operator classes and families.
Then, read about GiST index and its API. There you can find useful examples that will help you.
Also, a lot of helpful information you can find on development section of PostgreSQL site.
Any programming questions you may address to PostgreSQL developer's mailing list.

Related

Key value oriented database vs document oriented database

I have recently started learning NO SQL databases and I came across Key-Value oriented databases and Document oriented databases. Since they have a similar structure, aren't they saved and retrieved the exact same way? And if that is the case then why do we define them as separate types? Otherwise, how they are saved in the file system?
To get started it is better to pin point the least wrong vocabulary. What used to be called nosql is too broad in scope, and often there is no intersection feature-wise between two database that are dubbed nosql except for the fact that they somehow deal with "data". What program does not deal with data?! In the same spirit, I avoid the term Relational Database Management System (RDBMS). It is clear to most speakers and listeners that RDBMS is something among SQL Server, some kind of Oracle database, MySQL, PostgreSQL. It is fuzzy whether that includes SQLite, that is already an indicator, that "relational database" ain't the perfect word to describe the concept behind it. Even more so, what people usually call nosql never forbid relations. Even on top of "key-value" stores, one can build relations. In a Resource Description Framework database, the equivalent of SQL rows are called tuple, triple, quads and more generally and more simply: relations. Another example of relational database are database powered by datalog. So RDBMS and relational database is not a good word to describe the intended concepts, and when used by someone, only speak about the narrow view they have about the various paradigms that exists in the data(base) world.
In my opinion, it is better to speak of "SQL databases" that describe the databases that support a subset or superset of SQL programming language as defined by the ISO standard.
Then, the NoSQL wording makes sense: database that do not provide support for SQL programming language. In particular, that exclude Cassandra and Neo4J, that can be programmed with a language (respectivly CQL and Cypher / GQL) which surface syntax looks like SQL, but does not have the semantic of SQL (neither a superset, nor a subset of SQL). Remains Google BigQuery, which feels a lot like SQL, but I am not familiar enough with it to be able to draw a line.
Key-value store is also fuzzy. memcached, REDIS, foundationdb, wiredtiger, dbm, tokyo cabinet et. al are very different from each other and are used in verrrrrrrrrrry different use-cases.
Sorry, document-oriented database is not precise enough. Historically, they were two main databases, so called document database: ElasticSearch and MongoDB. And those yet-another-time, are very different software, and when used properly, do not solve the same problems.
You might have guessed it already, your question shows a lack of work, and as phrased, and even if I did not want to shave a yak regarding vocabulary related to databases, is too broad.
Since they have a similar structure,
No.
aren't they saved and retrieved the exact same way?
No.
And if that is the case then why do we define them as separate types?
Their programming interface, their deployment strategy and their internal structure, and intended use-cases are much different.
Otherwise, how they are saved in the file system?
That question alone is too broad, you need to ask a specific question at least explain your understanding of how one or more database work, and ask a question about where you want to go / what you want to understand. "How to go from point A-understanding (given), to point B-understanding (question)". In your question point A is absent, and point B is fuzzy or too-broad.
Moar:
First, make sure you have solid understanding of an SQL database, at the very least the SQL language (then dive into indices and at last fine-tuning). Without SQL knowledge, your are worthless on the job market. If you already have a good grasp of SQL, my recommendation is to forgo everything else but FoundationDB.
If you still want "benchmark" databases, first set a situation (real or imaginary) ie. a project that you know well, that requires a database. Try to fit several databases to solve the problems of that project.
Lastly, if you have a precise project in mind, try to answer the following questions, prior to asking another question on database-design:
What guarantees do you need. Question all the properties of ACID: Atomic, Consistent, Isolation, Durability. Look into BASE. You do not necessarily need ACID or BASE, but it is a good basis that is well documented to know where you want / need to go.
What is size of the data?
What is the shape of the data? Are they well defined types? Are they polymorphic types (heterogeneous shapes)?
Workload: Write-once then Read-only, mostly reads, mostly writes, a mix of both. Answer also the question how fast or slow can be writes or reads.
Querying: How queries look like: recursive / deep, columns or rows, or neighboor hood queries (like graphql and SQL without recursive queries do). Again what is the expected time to response.
Do not forgo to at least the review deployement and scaling strategies prior to commit to a particular solution.
On my side, I picked up foundationdb because it is the most versatile in those regards, even if at the moment it requires some code to be a drop-in replacement for all postgresql features.

business logic in stored procedures vs. middle layer

I'd like to use Postgres as web api storage backend. I certainly need (at least some) glue code to implement my REST interface (and/or WebSocket). I think about two options:
Implement most of the business logic as stored procedures, PL/SQL while using a very thin middle layer to handle the REST/websocket part.
middle layer implements most of the business logic, and reach Pg over it's abstract DB interface.
My question is what are the possible benefits/hindrances of the above designs compared to each other regarding flexibility, scalability, maintainability and availability?
I don't really care about the exact middle layer implementation (it can be either php, node.js, python or whatever), I'm interested in the benefits and pitfalls of the actual architectural design choice.
I'm aware of that I lose some flexibility by choosing (1) since it would be difficult to port the system to other than maybe oracle, and my users will be bound to postgres. In my case it's not very important, the database intended to be an integral part of the system anyway.
I'm especially interested in the benefits lost in case of choosing (2), and possible pitfalls of either case.
I think both options have their benefits and drawbacks.
(2) approach is good and known. Most simple applications and web services are using it. But sometimes, using stored procedure is much better than (2).
Here is some examples which, IMHO, are good to implement with stored procedures:
tracking changes of rows. I.e you have some table with items that are regularly updated and you want to have another table with all changes and dates of that changes for every item.
custom algorithms, if your functions can be used as expressions for indexing data.
you want to share some logic between several micro-services. If every micro-service are implemented using a different language, you have to re-implement some parts of the business logic for every language and micro-service. Using stored procedures obviously can help to avoid this.
Some benefits of (2) approach (with some "however" of course to confuse you :D):
You can use your favorite programing language to write business logic.
However: in (1) approach you can write procedures using pl/v8 or pl/php or pl/python or pl/whatever extension using your favorite language.
maintaning code is more easy than maintaining stored procedures.
However: there is some good methods to avoid such headaches with code maintenance. I.e: migrations, which is a good thing for every approach.
Also, you can put your functions into your own namespace, so to reinstall re-deploy procedures into database you have to just drop and re-create this namespace, not each function. This can be done with simple script.
you can use various ORM's to query data and got abstraction layers which can have much more complex logic and inheritance logic. In (1) it would be hard to use OOP patterns.
I think this is the most powerful argument against (1) approach, and I can't add any 'however' to this.

What is a MongoDB driver in layman's terms?

I've been scouring the MongoDB documentation, Google, Stackoverflow and YouTube... but I still can't seem to understand what a driver is used for in MongoDB.
I do know that different programming language can have one or many different drivers - but why do I need one?
You don't strictly speaking need one, but the alternative is building network packets manually scattered around in your code base... The term 'driver' is a bit irritating, because most people expect some kernel-level program that talks to hardware.
The MongoDB driver is more like an SDK or a helper library that helps you with a number of tasks that you'll almost certainly need to solve when you want to use MongoDB.
In essence, the MongoDB driver does these things:
it implements the MongoDB wire protocol that is used to talk to the database, i.e. it knows what 'messages' the database expects, it knows relevant constants, etc. 'It implements the MongoDB API' if you will.
It also comes with helpers to manage the actual TCP/IP sockets, creating them, resolving replica set addresses, implementing connection pooling, etc.
Next, the drivers contain helpers that make it easier to work with the BSON datatypes from your language, since there normally isn't a 1:1 mapping of types. A mongodb array, for example, could be mapped to an array or some kind of list or set container in most languages; ObjectId and ISODate might need a wrapper, and so on.
Lastly, the driver implements a serializer, that is, a piece of software that can create a copy of an instance 'from the outside', that is, without you having to implement a Serialize() method on each and every class (or whatever concept your language supports) you want to store. Together with 3), this writes the BSON representation of your data.
Serialization in itself isn't trivial, because one quickly has to cope with cyclical references, so a recursive algorithm on a set of unknown properties is required. If that doesn't sound complicated enough, the de-serialization (or hydration) of objects is even more painful, so it's not exactly the type of code that is super rewarding to write, unless it's highly reusable.
I'm sure I forgot something else the drivers do, but I think these are the key pain points they solve. As far as I know, their exact feature set varies from language to language and in some languages, the individual problems might be less or more pronounced, but they generally exist everywhere.

Databases non-ORM and Scala

What is the best non-ORM database to work with Scala? I find this link link text, but this does not answer my question fully.
Could be considered desirable features performance, scalability and facility to write complex structures of relationships between data.
Thanks
Do you mean non-relational? There are Scala client libraries/wrappers for many NoSQL databases, including Cassandra, MongoDB, Redis, Voldemort, CouchDB, etc.
If by "complex structures of relationships between data" you mean that you'd prefer not to have to normalize, any NoSQL database should do reasonably well.
However, note that none of them--to my knowledge--will do anything like enforcing a referential integrity constraint or dereferencing object navigation paths for you. For that you may want to consider a graph database or OODBMS; unfortunately I'm not aware of any that's open source, liberally licensed and clusterable.
Update: I just found OrientDB which actually meets all threetwo of these criteria.
Update 2: OrientDB's clustering support isn't released yet. As a wise man once said, two out of three ain't bad.
The best solution is probably not to worry about it...
Abstract away from the problem by using the pluggable-persistence support in Akka: http://doc.akkasource.org/persistence
Then you can try them all, and take your pick based on profiling results :)

(Non-Relational) DBMS Design Resource

As a personal project, I'm looking to build a rudimentary DBMS. I've read the relevant sections in Elmasri & Navathe (5ed), but could use a more focused text- something a bit more practical and detail-oriented, with real-world recommendations- as E&N only went so deep.
The rub is that I want to play with novel non-relational data models. While a lot of E&N was great- indexing implementation details in particular- the more advanced DBMS implementation was only targeted to a relational model.
I'd like to defer staring at DBMS source for a while if I can until I've got a better foundation. Any ideas?
First of all you have to understand the properties of each system. i can offer you to read this post. it's the first step to understand NOSQL or Not Only SQL.Secondly you can check this blog post to understand all these stuff visually.
Finally glance at open source projects such as Mongodb, Couchdb etc. to see the list you can go here
Actually, the first step would be to understand hierarchal, network, navigational, object models which are alternatives to relational. I'm not sure where XML fits in i.e. what model it is. As far as structure, research B-tree (not binary trees) implementation.