How to learn Postgresql and its internals? - postgresql

I am green on postgresql, so when reading source code of pg, I am very confused....Is there some useful material on postgresql source code? Thank you.

There are some nice presentations about some basic concepts like Datum, V1 Functions Calls and source code
http://www.postgresql.org/developer/coding
http://www.postgresql.org/files/developer/internalpics.pdf
this master thesis is very good document http://www.ic.unicamp.br/~celio/livrobd/postgres/ansi_sql_implementation_postgresql.pdf
http://www.postgresql.org/developer/ext.backend_dirs.html

It seems you are also unversed in using the internet!? ;-) First look should be the project homepage http://www.postgresql.org/. There you will find a "Developers" link which directs you to the available resources. One of them is the Developer FAQ which should be more than sufficient for the beginning.

Seems very usefull book - The Internals of PostgreSQL

A useful link can be found here for Postgres 14 Postgres Professional. It discusses about Isolation, MVCC, Buffer Cache, WAL, Locks and Query Execution.
Also refer a 2 day free course here PostgresPro Course

Related

PPDB paraphases searching

There is a well known lexical resources of paraphrases PPDB.
It comes with several forms from the biggest precision to the biggest recall. The biggest set XXXL for paraphrases contains ~5Gb of data.
I want PPDB for my research and I wounder what is the best engine to perform searching in such a big resources. I didn't try but I think to use it as is in file is not a good idea.
I was thinking about to export all the data to mongo, but I am not sure if this the best solution.
Please if you have some ideas share them with us.
Thank you.
You need to consider the following aspects:
1. For your use-case you will need a schemaless database
2. Transactions not required
3. Fast queries/searching
4. Easy to setup and deploy
5. Ability to handle large volumes of data
All the above aspects indicate to adopt MongoDB.
But you will have teething troubles to export data to MongoDB, but it is definitely worth the effort. Your data model can be as follows {key:[value1,value2,.....]} for each document.

MongoDB for personal non-distributed work

This might be answered here (or elsewhere) before but I keep getting mixed/no views on the internet.
I have never used anything else except SQL like databases and then I came across NoSQL DBs (mongoDB, specifically). I tried my hands on it. I was doing it just for fun, but everywhere the talk is that it is really great when you are using it across distributed servers. So I wonder, if it is any helpful(in a non-trivial way) for doing small projects and things mainly only on a personal computer? Are there some real advantages when there is just one server.
Although it would be cool to use MapReduce (and talk about it to peers :d) won't it be an overkill when used for small projects run on single servers? Or are there other advantages of this? I need some clear thought. Sorry if I sounded naive here.
Optional: Some examples where/how you have used would be great.
Thanks.
IMHO, MongoDB is perfectly valid for use for single server/small projects and it's not a pre-requisite that you should only use it for "big data" or multi server projects.
If MongoDB solves a particular requirement, it doesn't matter on the scale of the project so don't let that aspect sway you. Using MapReduce may be a bit overkill/not the best approach if you truly have low volume data and just want to do some basic aggregations - these could be done using the group operator (which currently has some limitations with regard to how much data it can return).
So I guess what I'm saying in general is, use the right tool for the job. There's nothing wrong with using MongoDB on small projects/single PC. If a RDBMS like SQL Server provides a better fit for your project then use that. If a NoSQL technology like MongoDB fits, then use that.
+1 on AdaTheDev - but there are 3 more things to note here:
Durability: From version 1.8 onwards, MongoDB has single server durability when started with --journal, so now it's more applicable to single-server scenarios
Choosing a NoSQL DB over say an RDBMS shouldn't be decided upon the single or multi server setting, but based on the modelling of the database. See for example 1 and 2 - it's easy to store comment-like structures in MongoDB.
MapReduce: again, it depends on the data modelling and the operation/calculation that needs to occur. Depending on the way you model your data you may or may not need to use MapReduce.

Smart way to evaluate what is the right NoSQL database for me?

There appears to be a myriad of NoSQL databases available these days:
CouchDB
MongoDB
Cassandra
Hadoop
There's also a boundary between these tools and tools such as Redis that work as a memcached replacement.
Without hand waving and throwing too many buzz words - my question is the following:
How does one intelligently decide which tool here makes the most sense for their project? Are the projects similar enough to where the answer to this is subjective, eg: Ruby is better than Python or Python is better than Ruby? Or are we talking Apples and oranges here in that they each of them solve different problems?
What's the best way to educate myself on this new trend?
Perhaps one way to think of it is, programming has recently evolved from using one general-purpose language for everything to using the general-purpose language for most things, plus domain-specific languages for the more appropriate parts. For example, you might use Lua to script artificial intelligence of a character in a game.
NoSQL databases might be similar. SQL is the general purpose database with the longest and broadest adoption. While it could be shoehorned to serve many tasks, programmers are beginning to use NoSQL as a domain-specific database when it is more appropriate.
I would argue, that the 4 major players you named do have quite different featuresets and try to solve different problems with different priority.
For instance, as far as i know Cassandra (and i assume Hadoop) central focus is on large scale installations.
MongoDb tries to be a better scaling alternative to classic SQL servers in providing comparably powerful query functions.
CouchDB's focus is comparably small scale (will not shard at all, "only" replicate), high durability and easy synchronization of data.
You might want to check out http://nosql-database.org/ for some more information.
I am facing pretty much the same problem as you, and i would say there is no real alternative to look at all solutions in detail.
Check out this site: http://cattell.net/datastores/ and in particular the PDF linked at the bottom (CACM Paper). The latter contains an excellent discussion of the relative merits of various data store solutions.
It's easy. NoSQL databases are ACID compliant databases minus some guarantees. So just decide which guarantees you can do without and find the database that fits. If you don't need durability for example, maybe redis is best. Or if you don't need multi-record transactions, then perhaps look into mongodb.

(Non-Relational) DBMS Design Resource

As a personal project, I'm looking to build a rudimentary DBMS. I've read the relevant sections in Elmasri & Navathe (5ed), but could use a more focused text- something a bit more practical and detail-oriented, with real-world recommendations- as E&N only went so deep.
The rub is that I want to play with novel non-relational data models. While a lot of E&N was great- indexing implementation details in particular- the more advanced DBMS implementation was only targeted to a relational model.
I'd like to defer staring at DBMS source for a while if I can until I've got a better foundation. Any ideas?
First of all you have to understand the properties of each system. i can offer you to read this post. it's the first step to understand NOSQL or Not Only SQL.Secondly you can check this blog post to understand all these stuff visually.
Finally glance at open source projects such as Mongodb, Couchdb etc. to see the list you can go here
Actually, the first step would be to understand hierarchal, network, navigational, object models which are alternatives to relational. I'm not sure where XML fits in i.e. what model it is. As far as structure, research B-tree (not binary trees) implementation.

Performance Tuning

How can i check the Query running from long time & steps of tuning the query? (Oracle)
Run explain plan for select .... to see what Oracle is doing with your query.
Post your query here so that we can look at it and help you out.
Check out the Oracle Performance Tuning FAQ for some tricks-of-the-trade, if you will.
You can capture the query by selecting from v$sql or v$sqltext.
If you are not familiar with it, look up 'Explain Plan' in the Oracle
documentation. There should be plenty on it in the performance tuning
guide.
Have a look at Quest Software's Toad for a third party tool that helps
in this area too.
K
Unfortunately your question is not expressed clearly. The other answers have already tackled the issue of tuning a known bad query, but another interpretation is that you want to monitor your database to find poorly performing queries.
If you don't have Enterprise Edition with the Diagnostics pack - and not many of us do - your best bet is to run statspack snapshots on a reqular basis. This will give you a lot of information about your system, including which queries take a long time to complete and which queries consume a lot of your system's resources. You can find out more about statspack here.
If you do not want to use OEM, then you can query and find out.
First find the long running query. If it's currently being executing, You can join gv$session to find which session running since long time. Then go to gv$sql to find SQL details. You need to look last_call_et column.If SQL executed some time inpast you can use dba_hist_snapshot ,dba_hist_sqlstat ,DBA_HIST_SQLTEXT tables to find offending SQL.
Once you get query, you can check what plan it's picking from dba_hist_sql_plan table if this SQL executed in past or from gv$sql_plan if it's currently executing.
Now you analyze execution plan and see if it's using right index, join etc.
If not tune those.
Let me know which step you have the problem. I can help you in answering those.