Using Postgres' external procedural languages over application code [closed] - postgresql

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 2 years ago.
Improve this question
I am trying to figure out the advantages and disadvantages of using non-plpgsql procedural languages (PL/Python, PL/Perl, PL/v8, etc.) to implement data manipulation logic on the database level instead of going up to model level/ORM of the application framework that interacts with the database (Rails, Entity Framework, Django, etc.) and implementing it there.
To give a concrete example, say, I have a table that contains Mustache templates, and I want to have them "rendered" somehow.
Table definition:
create table templates (
id serial primary key,
content text not null,
data jsonb not null
);
Usually I would go the model code and add and extra method to render the template. Example in Rails:
class Template < ApplicationRecord
def rendered
Mustache.render(content, data)
end
end
However, I could also write a PL/Python function that would do just that but on the database level:
create or replace function fn_mustache(template text, data jsonb)
returns text
language plpython3u
as $$
import chevron
import json
return chevron.render(template, json.loads(data))
$$;
create view v_templates as
select id, content, data, fn_mustache(content, data) as rendered
from templates;
This yields virtually the same result functionality-wise. This example is very basic, yet the idea is to use PL/Python (or others) to manipulate the data in a more advanced manner than PL/pgsql can allow for. That is, PL/pgsql does not have the same amount of libraries that any generic programming language provides today (in the example am relying on implementations of Mustache templating system which would not be practical to implement in PL/pgsql in this case). I obviously would not use PL/Python for any sort of networking or other OS-level features, but for operations exclusively on data this seems like a decent approach (change my mind).
Points that I can observe so far:
PL/Python is an "untrusted" language which I guess makes it by definition more dangerous to write a function in since you have access to syscalls; at least it feels like the cost of messing up a PL/Python function is higher than that of the mistake on the application layer, since the former is executed in the context of the database
Database approach is more extensible since I am working on the level that is the closest to the data, i.e. I am not scattering the presentation logic across multiple "tiers" (ORM and DB in this case). This means that if I need some other external service interested in interacting with the data, I can plug it directly into the database, bypassing the application layer.
Implementing this on model level just seems much simpler in execution
Supporting the application code variant seems easier as well since there are less concepts to keep in mind
What are the other advantages and disadvantages of these two approaches? (e.g. performance, maintainability)

You are wondering whether to have application logic inside the database or not. This is to a great extent a matter of taste. In the days of yore, the approach to implement application logic in database functions was more popular, but today it is usually frowned upon.
Extreme positions in this debate are
The application is implemented in the database to the extent that the database functions produce the HTML code that is sent to the client.
The database is just a dumb collection of tables with no triggers or constraints beyond a primary key, and the application tries to maintain data integrity.
The best solution is typically somewhere in the middle, but where is largely a matter of taste. You see that this is a typical opinion-based question. However, let me supply some arguments that help you make a decision.
Points speaking against application logic in the database:
It makes it more difficult to port to another database.
It is more complicated to develop and debug database functions than client code. For example, you won't have as advanced debugging tools.
That database machine has to perform not only the normal database workload, but also the application code workload. But databases are harder to scale than application servers (you can't just spin up a second database to handle part of the workload).
PostgreSQL-specific: all database functions run inside a single database transaction, so you cannot implement functionality that requires more complicated transaction management.
Points speaking for application logic in the database:
It becomes easier to port to another application server or client programming language.
Less data has to be transferred between client and server, which can make processing more efficient.
The software stack becomes shorter and the overall software architecture simpler.
My personal opinion is that anything that has to do with basic data integrity should be implemented in the database:
Have foreign keys and check constraints in the database. The application will of course also respect these rules (no point in triggering a database error), but it is good for data integrity to have a safety net.
If you have to keep redundant information in the database, use triggers to make sure that all copies of a datum are kept synchronized. This implicitly makes use of transactional atomicity.
Anything that is more complicated is best done in the application. Be wary of database functions that are very long or complicated. Exceptions can be made for performance reasons: perhaps some complicated report could not easily be written in pure SQL, and shipping all the raw data to the client is prohibitively expensive.

Related

what is the purpose of business login in api layer when all of it can be done in sql functions? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 14 days ago.
Improve this question
I've been programming api/font end apps for a while now with dotnet webapi & entity framework for orm then springboot with jpa, because of this my native sql query skill's are medicore so i recently ditched using any orm's in my side projects and retorted to calling native sql function's/view's/crud qurie's, after playing with these for a while i have so many questions,after using sql function's my api business layer barely has any logic, most of the codes only do data validation and pass the parameter's to the sql function via jdbc or jpa native query,
is sql functions doing most of the business logic normal and is it a good practice?
how do you handle error/success message return's to the client, right now iam returning
json directly from pgsql but i feel like it is not the best practice.
what is faster? orm or using orm to call native sql scripts?
Using SQL directly for CRUD operations is perfectly fine, I think.
ORM handles things like tracking which fields to update, in which order to do queries, mapping relations between objects, etc. But you can write maintainable and performant applications without it.
In theory, if you write your own SQL, it can be at least as fast or even faster than what an ORM does, but you need to remember to optimize things that an ORM would do out-of-the-box. Things like session caching, 2nd level caching, reusing prepared statements, batch processing, eager/lazy loading, etc.
There are some things that are much harder to implement and maintain in SQL, however. If your application is just CRUD on entities one at a time, then there is not much 'Business Logic' involved, and your SQL functions can perfectly handle this with simple INSERT/UPDATE/UPSERT/DELETE commands, and for read logic, I'd even recommend creating your own SELECT statements if you're good with it.
What do I consider 'Business Logic':
non-trivial validation
updates on multiple rows at once
updates that span different tables
conditional operations that relate to multiple rows or multiple tables
interaction with the user: present a non-trivial view, follow a flow, give feedback, ...
For use cases like this, you should first write the CRUD operations, either with SQL or ORM, and then write the actual use case in a bit of code that is independent of the CRUD layer. That way it's easier later if you need to change anything: you know where to look for what functionality.
A question like this, gets answers based on opinions. Here's mine:
Business logic written in SQL is normal, but what is your definition of "normal"? In databases like Oracle, PostgreSQL and SQL Server you can write simple functions and procedures to do whatever it takes for your business.
Good practice is imho to a technology you and your team understand really well. When it's Python, you use Python, when it's SQL, use SQL, etc.
In PostgreSQL you can use exceptions when errors occur, your application can translate these to something useful for the end user. That's something you always need, no matter where you do the business logic.
"faster" can be related to development, to maintenance, but also to usage. Development and maintenance depends on your skills. In usage ORM is always slower, but the difference can be so futile that nobody cares. And an ORM can be much faster for development and maintenance.
We have almost all logic in the database, in SQL and PL/pgSQL, for raw speed where every millisecond counts. We just can't afford any network overhead. Is it the best? I don't know. Can we maintain it? Yes. Does it work? Yes. Is the customer happy? Yes, and that's the only thing that counts.

Key value oriented database vs document oriented database

I have recently started learning NO SQL databases and I came across Key-Value oriented databases and Document oriented databases. Since they have a similar structure, aren't they saved and retrieved the exact same way? And if that is the case then why do we define them as separate types? Otherwise, how they are saved in the file system?
To get started it is better to pin point the least wrong vocabulary. What used to be called nosql is too broad in scope, and often there is no intersection feature-wise between two database that are dubbed nosql except for the fact that they somehow deal with "data". What program does not deal with data?! In the same spirit, I avoid the term Relational Database Management System (RDBMS). It is clear to most speakers and listeners that RDBMS is something among SQL Server, some kind of Oracle database, MySQL, PostgreSQL. It is fuzzy whether that includes SQLite, that is already an indicator, that "relational database" ain't the perfect word to describe the concept behind it. Even more so, what people usually call nosql never forbid relations. Even on top of "key-value" stores, one can build relations. In a Resource Description Framework database, the equivalent of SQL rows are called tuple, triple, quads and more generally and more simply: relations. Another example of relational database are database powered by datalog. So RDBMS and relational database is not a good word to describe the intended concepts, and when used by someone, only speak about the narrow view they have about the various paradigms that exists in the data(base) world.
In my opinion, it is better to speak of "SQL databases" that describe the databases that support a subset or superset of SQL programming language as defined by the ISO standard.
Then, the NoSQL wording makes sense: database that do not provide support for SQL programming language. In particular, that exclude Cassandra and Neo4J, that can be programmed with a language (respectivly CQL and Cypher / GQL) which surface syntax looks like SQL, but does not have the semantic of SQL (neither a superset, nor a subset of SQL). Remains Google BigQuery, which feels a lot like SQL, but I am not familiar enough with it to be able to draw a line.
Key-value store is also fuzzy. memcached, REDIS, foundationdb, wiredtiger, dbm, tokyo cabinet et. al are very different from each other and are used in verrrrrrrrrrry different use-cases.
Sorry, document-oriented database is not precise enough. Historically, they were two main databases, so called document database: ElasticSearch and MongoDB. And those yet-another-time, are very different software, and when used properly, do not solve the same problems.
You might have guessed it already, your question shows a lack of work, and as phrased, and even if I did not want to shave a yak regarding vocabulary related to databases, is too broad.
Since they have a similar structure,
No.
aren't they saved and retrieved the exact same way?
No.
And if that is the case then why do we define them as separate types?
Their programming interface, their deployment strategy and their internal structure, and intended use-cases are much different.
Otherwise, how they are saved in the file system?
That question alone is too broad, you need to ask a specific question at least explain your understanding of how one or more database work, and ask a question about where you want to go / what you want to understand. "How to go from point A-understanding (given), to point B-understanding (question)". In your question point A is absent, and point B is fuzzy or too-broad.
Moar:
First, make sure you have solid understanding of an SQL database, at the very least the SQL language (then dive into indices and at last fine-tuning). Without SQL knowledge, your are worthless on the job market. If you already have a good grasp of SQL, my recommendation is to forgo everything else but FoundationDB.
If you still want "benchmark" databases, first set a situation (real or imaginary) ie. a project that you know well, that requires a database. Try to fit several databases to solve the problems of that project.
Lastly, if you have a precise project in mind, try to answer the following questions, prior to asking another question on database-design:
What guarantees do you need. Question all the properties of ACID: Atomic, Consistent, Isolation, Durability. Look into BASE. You do not necessarily need ACID or BASE, but it is a good basis that is well documented to know where you want / need to go.
What is size of the data?
What is the shape of the data? Are they well defined types? Are they polymorphic types (heterogeneous shapes)?
Workload: Write-once then Read-only, mostly reads, mostly writes, a mix of both. Answer also the question how fast or slow can be writes or reads.
Querying: How queries look like: recursive / deep, columns or rows, or neighboor hood queries (like graphql and SQL without recursive queries do). Again what is the expected time to response.
Do not forgo to at least the review deployement and scaling strategies prior to commit to a particular solution.
On my side, I picked up foundationdb because it is the most versatile in those regards, even if at the moment it requires some code to be a drop-in replacement for all postgresql features.

business logic in stored procedures vs. middle layer

I'd like to use Postgres as web api storage backend. I certainly need (at least some) glue code to implement my REST interface (and/or WebSocket). I think about two options:
Implement most of the business logic as stored procedures, PL/SQL while using a very thin middle layer to handle the REST/websocket part.
middle layer implements most of the business logic, and reach Pg over it's abstract DB interface.
My question is what are the possible benefits/hindrances of the above designs compared to each other regarding flexibility, scalability, maintainability and availability?
I don't really care about the exact middle layer implementation (it can be either php, node.js, python or whatever), I'm interested in the benefits and pitfalls of the actual architectural design choice.
I'm aware of that I lose some flexibility by choosing (1) since it would be difficult to port the system to other than maybe oracle, and my users will be bound to postgres. In my case it's not very important, the database intended to be an integral part of the system anyway.
I'm especially interested in the benefits lost in case of choosing (2), and possible pitfalls of either case.
I think both options have their benefits and drawbacks.
(2) approach is good and known. Most simple applications and web services are using it. But sometimes, using stored procedure is much better than (2).
Here is some examples which, IMHO, are good to implement with stored procedures:
tracking changes of rows. I.e you have some table with items that are regularly updated and you want to have another table with all changes and dates of that changes for every item.
custom algorithms, if your functions can be used as expressions for indexing data.
you want to share some logic between several micro-services. If every micro-service are implemented using a different language, you have to re-implement some parts of the business logic for every language and micro-service. Using stored procedures obviously can help to avoid this.
Some benefits of (2) approach (with some "however" of course to confuse you :D):
You can use your favorite programing language to write business logic.
However: in (1) approach you can write procedures using pl/v8 or pl/php or pl/python or pl/whatever extension using your favorite language.
maintaning code is more easy than maintaining stored procedures.
However: there is some good methods to avoid such headaches with code maintenance. I.e: migrations, which is a good thing for every approach.
Also, you can put your functions into your own namespace, so to reinstall re-deploy procedures into database you have to just drop and re-create this namespace, not each function. This can be done with simple script.
you can use various ORM's to query data and got abstraction layers which can have much more complex logic and inheritance logic. In (1) it would be hard to use OOP patterns.
I think this is the most powerful argument against (1) approach, and I can't add any 'however' to this.

recommendations for a dbms for an EAV system with mostly insert and select operations needs on .net stack

In the project I have been working on, the data modeling requirements are:
A system consisting of N number of clients with each having N number of events. An event is an entity with a required name and timestamp at which it occurs. Optionally, an event may have N number of properties (key/value pares) defining attributes that a client want to store with the particular instance of that event.
The system will have mostly:
inserts – events are logged but never updated.
selects – reports/actions will be generated/executed based on events and properties of any possible combinations.
The requirements reflect an entity-attribute-value (EAV) data model. After researching for sometimes, I feel that a relational dbms like Sql Server might not be a good fit for this. (correct me if I'm wrong!)
So I'm leaning toward NoSql option like MongoDb/CouchDb/RavenDb etc.
My questions are:
What is the best fit in available NoSql solutions keeping in view of my system's heavy insert/select needs?
I'm also open for relational option if these requirements can be translated into relational schema. Although I personally doubt this, but after reading performance DBA answers (like referenced here), I got curious. However, I couldn't figure out myself an optimal relational model for my requirements, perhaps the system being rather generic.
thanks!
MongoDB really shines when you write unstructured data to it (like your event). Also, it is able to sustain pretty heavy write load. However, it's not very good for reporting. At least, for reporting in the traditional sense.
So, if your reporting needs are simple, you might get away with some simple map-reduce jobs. Otherwise you can export data to a relational database (nightly job, for example) and report the hell out of it.
Such hybrid solution is pretty common (in my experience).

Key-Value Stores vs. RDBMs vs. "Cloud" DBs (SDB) [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 6 years ago.
Improve this question
I'm comfortable in the MySQL space having designed several apps over the past few years, and then continuously refining performance and scalability aspects. I also have some experience working with memcached to provide application side speed-ups on frequently queried result sets. And recently I implemented the Amazon SDB as my primary "database" for an ecommerce experiment.
To oversimplify, a quick justification I went through in my mind for using the SDB service was that using a schema-less database structure would allow me to focus on the logical problem of my project and rapidly accumulate content in my data-store. That is, don't worry about setting up and normalize all possible permutations of a product's attributes before hand; simply start loading in the products and the SDB will simply remember everything that is available.
Now that I have managed to get through the first few iterations of my project and I need to setup simple interfaces to the data, I am running to issues that I had taken for granted working with MySQL. Ex: grouping in select statements and limit syntax to query "items 50 to 100". The ease advantage I gained using schema free architecture of SDB, I lost to a performance hit of querying/looping a resultset with just over 1800 items.
Now I'm reading about projects like Tokyo Cabinet that are extending the concept of in-memory key-value stores to provide pseudo-relational functionality at ridiculously faster speeds (14x i read somewhere).
My question:
Are there some rudimentary guidelines or heuristics that I as an application designer/developer can go through to evaluate which DB tech is the most appropriate at each stage of my project.
Ex: At a prototyping stage where logical/technical unknowns of the application make data structure fluid: use SDB.
At a more mature stage where user deliverables are a priority, use traditional tools where you don't have to spend dev time writing sorting, grouping or pagination logic.
Practical experience with these tools would be very much appreciated.
Thanks SO!
Shaheeb R.
The problems you are finding are why RDBMS specialists view some of the alternative systems with a jaundiced eye. Yes, the alternative systems handle certain specific requirements extremely fast, but as soon as you want to do something else with the same data, the fleetest suddenly becomes the laggard. By contrast, an RDBMS typically manages the variations with greater aplomb; it may not be quite as fast as the fleetest for the specialized workload which the fleetest is micro-optimized to handle, but it seldom deteriorates as fast when called upon to deal with other queries.
The new solutions are not silver bullets.
Compared to traditional RDBMS, these systems make improvements in some aspect (scalability, availability or simplicity) by trading-off other aspects (reduced query capability, eventual consistency, horrible performance for certain operations).
Think of these not as replacements of the traditional database, but they are specialized tools for a known, specific need.
Take Amazon Simple DB for example, SDB is basically a huge spreadsheet, if that is what your data looks like, then it probably works well and the superb scalability and simplicity will save you a lot of time and money.
If your system requires very structured and complex queries but you insist with one of these cool new solution, you will soon find yourself in the middle of re-implementing a amateurish, ill-designed RDBMS, with all of its inherent problems.
In this respect, if you do not know whether these will suit your need, I think it is actually better to do your first few iterations in a traditional RDBMS because they give you the best flexibility and capability especially in a single server deployment and under modest load. (see CAP Theorem).
Once you have a better idea about what your data will look like and how will they be used, then you can match your need with an alternative solution.
If you want the simplicity of a cloud hosted solution, but needs a relational database, you can check out: Amazon Relational Database Service