MongoDB: Stored Procedures [duplicate] - mongodb

This question already has answers here:
MongoDB Stored Procedure Equivalent
(3 answers)
Closed 8 years ago.
As I heard, mongoDB can store internal procedures.
How can I use it?
Official Help is very short.
Can I use stored proc. to implement small logic on this layer?
Same as Postgres pl/pgSQL.

The duplicate question ( MongoDB Stored Procedure Equivalent ) does explain that you can store a procedure within MongoDB that can be called via the eval() command, however, it doesn't really explain why this is a bad thing.
Eval is a direct access to an, almost, unrestricted JS environment called from MongoDB's C++ code. Good to also mention that injection through unescaped parameters is very easy.
They are not stored procedures that work within MongoDBs own runtime (unlike the stored procedures you are thinking of) the JS engine is run from MongoDB, MongoDB is not programmed in JS; it is programmed in C++.
They are only available from a JS context, not from MongoDB's C++ context.
By default they can take global lock even with the nolock option set, it all depends upon the operations you call and the JS in itself is extremely slow in comparison to native MongoDB runtime.
As such:
Can I use stored proc. to implement small logic on this layer?
No. It is actually implemented on a third layer, separate from MongoDB.
MongoDB is designed to run this stuff from client side, there is a 90% chance you will get no real benefits by using "stored procedures". In fact in many ACID databases they are heavily abused and used in such a way that actually slows down applications and makes them more prone to failure. So you need to think very carefully about whether you really "need" them or not.

Related

what is the purpose of business login in api layer when all of it can be done in sql functions? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 14 days ago.
Improve this question
I've been programming api/font end apps for a while now with dotnet webapi & entity framework for orm then springboot with jpa, because of this my native sql query skill's are medicore so i recently ditched using any orm's in my side projects and retorted to calling native sql function's/view's/crud qurie's, after playing with these for a while i have so many questions,after using sql function's my api business layer barely has any logic, most of the codes only do data validation and pass the parameter's to the sql function via jdbc or jpa native query,
is sql functions doing most of the business logic normal and is it a good practice?
how do you handle error/success message return's to the client, right now iam returning
json directly from pgsql but i feel like it is not the best practice.
what is faster? orm or using orm to call native sql scripts?
Using SQL directly for CRUD operations is perfectly fine, I think.
ORM handles things like tracking which fields to update, in which order to do queries, mapping relations between objects, etc. But you can write maintainable and performant applications without it.
In theory, if you write your own SQL, it can be at least as fast or even faster than what an ORM does, but you need to remember to optimize things that an ORM would do out-of-the-box. Things like session caching, 2nd level caching, reusing prepared statements, batch processing, eager/lazy loading, etc.
There are some things that are much harder to implement and maintain in SQL, however. If your application is just CRUD on entities one at a time, then there is not much 'Business Logic' involved, and your SQL functions can perfectly handle this with simple INSERT/UPDATE/UPSERT/DELETE commands, and for read logic, I'd even recommend creating your own SELECT statements if you're good with it.
What do I consider 'Business Logic':
non-trivial validation
updates on multiple rows at once
updates that span different tables
conditional operations that relate to multiple rows or multiple tables
interaction with the user: present a non-trivial view, follow a flow, give feedback, ...
For use cases like this, you should first write the CRUD operations, either with SQL or ORM, and then write the actual use case in a bit of code that is independent of the CRUD layer. That way it's easier later if you need to change anything: you know where to look for what functionality.
A question like this, gets answers based on opinions. Here's mine:
Business logic written in SQL is normal, but what is your definition of "normal"? In databases like Oracle, PostgreSQL and SQL Server you can write simple functions and procedures to do whatever it takes for your business.
Good practice is imho to a technology you and your team understand really well. When it's Python, you use Python, when it's SQL, use SQL, etc.
In PostgreSQL you can use exceptions when errors occur, your application can translate these to something useful for the end user. That's something you always need, no matter where you do the business logic.
"faster" can be related to development, to maintenance, but also to usage. Development and maintenance depends on your skills. In usage ORM is always slower, but the difference can be so futile that nobody cares. And an ORM can be much faster for development and maintenance.
We have almost all logic in the database, in SQL and PL/pgSQL, for raw speed where every millisecond counts. We just can't afford any network overhead. Is it the best? I don't know. Can we maintain it? Yes. Does it work? Yes. Is the customer happy? Yes, and that's the only thing that counts.

Key value oriented database vs document oriented database

I have recently started learning NO SQL databases and I came across Key-Value oriented databases and Document oriented databases. Since they have a similar structure, aren't they saved and retrieved the exact same way? And if that is the case then why do we define them as separate types? Otherwise, how they are saved in the file system?
To get started it is better to pin point the least wrong vocabulary. What used to be called nosql is too broad in scope, and often there is no intersection feature-wise between two database that are dubbed nosql except for the fact that they somehow deal with "data". What program does not deal with data?! In the same spirit, I avoid the term Relational Database Management System (RDBMS). It is clear to most speakers and listeners that RDBMS is something among SQL Server, some kind of Oracle database, MySQL, PostgreSQL. It is fuzzy whether that includes SQLite, that is already an indicator, that "relational database" ain't the perfect word to describe the concept behind it. Even more so, what people usually call nosql never forbid relations. Even on top of "key-value" stores, one can build relations. In a Resource Description Framework database, the equivalent of SQL rows are called tuple, triple, quads and more generally and more simply: relations. Another example of relational database are database powered by datalog. So RDBMS and relational database is not a good word to describe the intended concepts, and when used by someone, only speak about the narrow view they have about the various paradigms that exists in the data(base) world.
In my opinion, it is better to speak of "SQL databases" that describe the databases that support a subset or superset of SQL programming language as defined by the ISO standard.
Then, the NoSQL wording makes sense: database that do not provide support for SQL programming language. In particular, that exclude Cassandra and Neo4J, that can be programmed with a language (respectivly CQL and Cypher / GQL) which surface syntax looks like SQL, but does not have the semantic of SQL (neither a superset, nor a subset of SQL). Remains Google BigQuery, which feels a lot like SQL, but I am not familiar enough with it to be able to draw a line.
Key-value store is also fuzzy. memcached, REDIS, foundationdb, wiredtiger, dbm, tokyo cabinet et. al are very different from each other and are used in verrrrrrrrrrry different use-cases.
Sorry, document-oriented database is not precise enough. Historically, they were two main databases, so called document database: ElasticSearch and MongoDB. And those yet-another-time, are very different software, and when used properly, do not solve the same problems.
You might have guessed it already, your question shows a lack of work, and as phrased, and even if I did not want to shave a yak regarding vocabulary related to databases, is too broad.
Since they have a similar structure,
No.
aren't they saved and retrieved the exact same way?
No.
And if that is the case then why do we define them as separate types?
Their programming interface, their deployment strategy and their internal structure, and intended use-cases are much different.
Otherwise, how they are saved in the file system?
That question alone is too broad, you need to ask a specific question at least explain your understanding of how one or more database work, and ask a question about where you want to go / what you want to understand. "How to go from point A-understanding (given), to point B-understanding (question)". In your question point A is absent, and point B is fuzzy or too-broad.
Moar:
First, make sure you have solid understanding of an SQL database, at the very least the SQL language (then dive into indices and at last fine-tuning). Without SQL knowledge, your are worthless on the job market. If you already have a good grasp of SQL, my recommendation is to forgo everything else but FoundationDB.
If you still want "benchmark" databases, first set a situation (real or imaginary) ie. a project that you know well, that requires a database. Try to fit several databases to solve the problems of that project.
Lastly, if you have a precise project in mind, try to answer the following questions, prior to asking another question on database-design:
What guarantees do you need. Question all the properties of ACID: Atomic, Consistent, Isolation, Durability. Look into BASE. You do not necessarily need ACID or BASE, but it is a good basis that is well documented to know where you want / need to go.
What is size of the data?
What is the shape of the data? Are they well defined types? Are they polymorphic types (heterogeneous shapes)?
Workload: Write-once then Read-only, mostly reads, mostly writes, a mix of both. Answer also the question how fast or slow can be writes or reads.
Querying: How queries look like: recursive / deep, columns or rows, or neighboor hood queries (like graphql and SQL without recursive queries do). Again what is the expected time to response.
Do not forgo to at least the review deployement and scaling strategies prior to commit to a particular solution.
On my side, I picked up foundationdb because it is the most versatile in those regards, even if at the moment it requires some code to be a drop-in replacement for all postgresql features.

Using Postgres' external procedural languages over application code [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 2 years ago.
Improve this question
I am trying to figure out the advantages and disadvantages of using non-plpgsql procedural languages (PL/Python, PL/Perl, PL/v8, etc.) to implement data manipulation logic on the database level instead of going up to model level/ORM of the application framework that interacts with the database (Rails, Entity Framework, Django, etc.) and implementing it there.
To give a concrete example, say, I have a table that contains Mustache templates, and I want to have them "rendered" somehow.
Table definition:
create table templates (
id serial primary key,
content text not null,
data jsonb not null
);
Usually I would go the model code and add and extra method to render the template. Example in Rails:
class Template < ApplicationRecord
def rendered
Mustache.render(content, data)
end
end
However, I could also write a PL/Python function that would do just that but on the database level:
create or replace function fn_mustache(template text, data jsonb)
returns text
language plpython3u
as $$
import chevron
import json
return chevron.render(template, json.loads(data))
$$;
create view v_templates as
select id, content, data, fn_mustache(content, data) as rendered
from templates;
This yields virtually the same result functionality-wise. This example is very basic, yet the idea is to use PL/Python (or others) to manipulate the data in a more advanced manner than PL/pgsql can allow for. That is, PL/pgsql does not have the same amount of libraries that any generic programming language provides today (in the example am relying on implementations of Mustache templating system which would not be practical to implement in PL/pgsql in this case). I obviously would not use PL/Python for any sort of networking or other OS-level features, but for operations exclusively on data this seems like a decent approach (change my mind).
Points that I can observe so far:
PL/Python is an "untrusted" language which I guess makes it by definition more dangerous to write a function in since you have access to syscalls; at least it feels like the cost of messing up a PL/Python function is higher than that of the mistake on the application layer, since the former is executed in the context of the database
Database approach is more extensible since I am working on the level that is the closest to the data, i.e. I am not scattering the presentation logic across multiple "tiers" (ORM and DB in this case). This means that if I need some other external service interested in interacting with the data, I can plug it directly into the database, bypassing the application layer.
Implementing this on model level just seems much simpler in execution
Supporting the application code variant seems easier as well since there are less concepts to keep in mind
What are the other advantages and disadvantages of these two approaches? (e.g. performance, maintainability)
You are wondering whether to have application logic inside the database or not. This is to a great extent a matter of taste. In the days of yore, the approach to implement application logic in database functions was more popular, but today it is usually frowned upon.
Extreme positions in this debate are
The application is implemented in the database to the extent that the database functions produce the HTML code that is sent to the client.
The database is just a dumb collection of tables with no triggers or constraints beyond a primary key, and the application tries to maintain data integrity.
The best solution is typically somewhere in the middle, but where is largely a matter of taste. You see that this is a typical opinion-based question. However, let me supply some arguments that help you make a decision.
Points speaking against application logic in the database:
It makes it more difficult to port to another database.
It is more complicated to develop and debug database functions than client code. For example, you won't have as advanced debugging tools.
That database machine has to perform not only the normal database workload, but also the application code workload. But databases are harder to scale than application servers (you can't just spin up a second database to handle part of the workload).
PostgreSQL-specific: all database functions run inside a single database transaction, so you cannot implement functionality that requires more complicated transaction management.
Points speaking for application logic in the database:
It becomes easier to port to another application server or client programming language.
Less data has to be transferred between client and server, which can make processing more efficient.
The software stack becomes shorter and the overall software architecture simpler.
My personal opinion is that anything that has to do with basic data integrity should be implemented in the database:
Have foreign keys and check constraints in the database. The application will of course also respect these rules (no point in triggering a database error), but it is good for data integrity to have a safety net.
If you have to keep redundant information in the database, use triggers to make sure that all copies of a datum are kept synchronized. This implicitly makes use of transactional atomicity.
Anything that is more complicated is best done in the application. Be wary of database functions that are very long or complicated. Exceptions can be made for performance reasons: perhaps some complicated report could not easily be written in pure SQL, and shipping all the raw data to the client is prohibitively expensive.

Not recommended to use server-side functions in MongoDB, does this go for MapReduce as well?

The MongoDB documentation states that it is not recommended to use its stored functions feature. This question goes through some of the reasons, but they all seem to boil down to "eval is evil".
Are there specific reasons why server-side functions should not be used in a MapReduce query?
The system.js functions are available to Map Reduce jobs by default ( https://jira.mongodb.org/browse/SERVER-8632 notes a slight glitch to that in 2.4.0rc ).
They are not actually evaled within the native V8/Spidermonkey evironment so tehcnically that part of them is also gone.
So no, there is no real problems, they will run as though native within that Map Reduce and should run just as fast and "good" as any other javascript you write. In fact the system.js collection is more designed to house code for map reduce jobs, it is later uses that sees it used as a hack for "stored procedures".

JavaScript Stored Function on MongoDB Server

This is related to javascript stored function in mongodb server. I know all the details about the working and use cases. I am doubtful about one line which is in the official documentation of MongoDB.
"Note : We do not recommend using server-side stored functions if possible."
Infact what I feel, after moving to V8 JavaScript engine ( improving concurrency issues for javascript queries ) and given the fact this may save us many network round trip time, why this is not recommended by 10gen?
This is not recommended due to the fact that the javascript function needs to take write lock for the duration of it's executing meaning you'll cause potential bottle necks in your write performance.
There are some disadvantages of stored procedures in general:
https://stackoverflow.com/questions/462978/when-should-you-use-stored-procedures
Yet I understand your point concerning the network roundtrips.