Using Apacheage with PostgreSQL? - postgresql

I am newly started to Apacheage and wondering What are the main differences between using PostgreSQL alone and using Apache Age with PostgreSQL for data processing. I know Apacheage is an extension for grapgh database In Postgres. But what is the importance of using ApacheAge with Postgres

Apache AGE is an extension for PostgreSQL that enables users to leverage a graph database on top of the existing relational databases. AGE is an acronym for A Graph Extension , a multi-model database fork of PostgreSQL. The basic principle of the project is to create a single storage that handles both the relational and graph data model so that the users can use the standard SQL along with openCypher, one of the most popular graph query languages today.
Reference and for more information you can visit apache age github

PostgreSQL is a relational database management system ( RDBMS ). Meanwhile AGE is an extension over PostgreSQL which allows the functionalities of a graph database to be possible. If we only use PostgreSQL we won't be able to make a graph and make nodes in it and get that functionality, so this is why we use Apache AGE with PostgreSQL.

Apache AGE basically enhances PostgreSQL's relational database capabilities by incorporating graph database features. Data can be stored, accessed, and analyzed as a graph using Apache AGE, which is especially helpful for large, interconnected data sets. Using AGE, users may model and query relationships between data by using graph database features including nodes, edges, and properties.
Also, AGE integrates with PostgreSQL's SQL engine, which means that users can leverage their existing knowledge of SQL to query and analyze graph data. For visualizing you can use Apache Age Viewer.
AGE also supports many of PostgreSQL's advanced SQL features, such as window functions and CTEs (common table expressions).
You can check their website for more details.

Although the other answers are essentially correct, I want to provide a bit of context.
1. Apache Age is a powerful open-source extension of Postgres that adds graph database functionality to the relational database.
To understand this better, you should know what graph databases are. Visit the link to learn more. (graph database). In short, you can leverage open-source extensions like Apache Age to basically extend Postgres's capabilities and model complex relationships in your data.
This combination is particularly useful in scenarios where data is both structured and interconnected, such as social networks, recommendation engines, or fraud detection systems.
The following use cases of Apache Age might should further clear things up.I hope this helps! Let me know if you have any additional questions.
Use Cases of Apache Age:
Ability to store and query graph data using SQL
Combining the strengths of both graph databases and relational databases
Efficiently managing structured and interconnected data
Finding insights and relationships that might be difficult to find using traditional SQL queries alone.

Using Apache Age with PostgreSQL can provide several benefits, such as:
Graph Database Functionality: With Apache Age, users can add graph database functionality to their PostgreSQL database. This allows them to model and store data in a way that is better suited for graph data, as opposed to traditional relational database structures.
Improved Querying: Apache Age provides a graph query language called Cypher, which is specifically designed for querying graph data. This can make it easier to query complex and interconnected data, and can provide better performance for certain types of queries.
Integration with Existing PostgreSQL Systems: Apache Age is an extension for PostgreSQL, which means that it integrates seamlessly with existing PostgreSQL systems. Users can continue to use their existing tools and interfaces, and can easily incorporate graph database functionality into their existing workflows. There are many more.

Related

Data mining with postgres in production environment - is there a better way?

There is a web application which is running for a years and during its life time the application has gathered a lot of user data. Data is stored in relational DB (postgres). Not all of this data is needed to run application (to do the business). However form time to time business people ask me to provide reports of this data data. And this causes some problems:
sometimes these SQL queries are long running
quires are executed against production DB (not cool)
not so easy to deliver reports on weekly or monthly base
some parts of data is stored in way which is not suitable for such
querying (queries are inefficient)
My idea (note that I am a developer not the data mining specialist) how to improve this whole process of delivering reports is:
create separate DB which regularly is update with production data
optimize how data is stored
create a dashboard to present reports
Question: But is there a better way? Is there another DB which better fits for such data analysis? Or should I look into modern data mining tools?
Thanks!
Do you really do data mining (as in: classification, clustering, anomaly detection), or is "data mining" for you any reporting on the data? In the latter case, all the "modern data mining tools" will disappoint you, because they serve a different purpose.
Have you used the indexing functionality of Postgres well? Your scenario sounds as if selection and aggregation are most of the work, and SQL databases are excellent for this - if well designed.
For example, materialized views and triggers can be used to process data into a scheme more usable for your reporting.
There are a thousand ways to approach this issue but I think that the path of least resistance for you would be postgres replication. Check out this Postgres replication tutorial for a quick, proof-of-concept. (There are many hits when you Google for postgres replication and that link is just one of them.) Here is a link documenting streaming replication from the PostgreSQL site's wiki.
I am suggesting this because it meets all of your criteria and also stays withing the bounds of the technology you're familiar with. The only learning curve would be the replication part.
Replication solves your issue because it would create a second database which would effectively become your "read-only" db which would be updated via the replication process. You would keep the schema the same but your indexing could be altered and reports/dashboards customized. This is the database you would query. Your main database would be your transactional database which serves the users and the replicated database would serve the stakeholders.
This is a wide topic, so please do your diligence and research it. But it's also something that can work for you and can be quickly turned around.
If you really want try Data Mining with PostgreSQL there are some tools which can be used.
The very simple way is KNIME. It is easy to install. It has full featured Data Mining tools. You can access your data directly from database, process and save it back to database.
Hardcore way is MADLib. It installs Data Mining functions in Python and C directly in Postgres so you can mine with SQL queries.
Both projects are stable enough to try it.
For reporting, we use non-transactional (read only) database. We don't care about normalization. If I were you, I would use another database for reporting. I will desing the tables following OLAP principals, (star schema, snow flake), and use an ETL tool to dump the data periodically (may be weekly) to the read only database to start creating reports.
Reports are used for decision support, so they don't have to be in realtime, and usually don't have to be current. In other words it is acceptable to create report up to last week or last month.

Advantage of ontology over RDBMS in an offline application

Is there any advantage of using ontology based database (linked data) instead of RDBMS in an offline application? Does linked data provide more relations and reasoning capabilities using SPARQL than SQL? Can I not achieve the same using joins in SQL?
Suppose I am storing the details of various mobile phones. This database should answer user centric queries like
1.list of all mobiles with good (quantified) touch interface
2.mobiles similar to Samsung Galaxy s4
Can I not retrieve efficient results using RDBMS itself with joins? If the answer is yes, then would the performance of answering these queries between the two database models be of argument here? Basically what is the edge that I get get by using ontologies in such scenarios?
The main advantage of using ontologies is the formalized semantics. This way a reasoner can automatically infer new statements without writing specific code.
But it's true, that you can model every Linked Data also in RDBMS and the other way around. The same is true for querying with SPARQL or SQL. You can achieve the same results. SPARQL has some advantages if your SQL query requires multiple joins. This can be expressed in a far more meaningful way in SPARQL.
The disadvantage of ontology based databases is nowadays still a lack of performance in comparison to RDBMS.

OLAP and postgresql- tool or methodology?

I was reviewing some documents for making my database perform better and I came across with "OLAP" pre-aggregation term. I was wondering if OLAP is a tool or or methodology or approach. For example my DBMS is postgresql and I am working on a big databse. To speed up I have to use some aggregation and pre-aggregation methods. How OLAP can be helpful?
OLAP is a database role. When storing OLAP data in the db, typically you aren't running live transactional information off the db, but rather keeping it around for analytical and business intelligence reasons.
It isn't a tool. It isn't an approach either, since some approaches are needed for OLAP but some are helpful together in transactional environments as well.
In general you shouldn't think about speeding up an application by incorporating OLAP into it. Instead you would look at separating out reporting functions into a separate db server, and import the data periodically, and then separate data feeds from operation data stores, etc. This is a very different field than transactional application development.

Java ORMs on NoSQL DB like HBase

I have recently started getting familiarized with NoSQL (HBase). I am definitely a noob.
I was investigating about ORMs and high level clients which can be used on HBase and came across a few.
Some ORM libraries like Kundera are providing SQL like data query functionality. I am finding this a little counter intuitive.
Can any one help me understand why we would again need SQL like querying if the whole objective was to move away from it?
Also can anyone comment on your experiences with ORMs for HBase? I looked at a few of them from http://wiki.apache.org/hadoop/SupportingProjects and started looking at Kundera.
Another related question - Does data query with Kundera run map reduce jobs internally?
kundera or Spring data might provide user friendly ORM layer over NoSQL databases, but the underlying entity model still has to be NoSQL friendly. This means that NoSQL users should not blindly follow RDBMS modeling strategies but design ORM entities in such a way so that all NoSQL capabilities can be used.
As a thumb rule, the kundera ORM entities should be designed using query-first strategy where first the queries need to defined so as to create primary keys and also ensuring that relationship model is used as minimal as possible. Querying on random columns and full scans should be avoided and so data might have to be replicated across entities for reducing multiple entity look ups. Also, transactions management needs to be planned. FYI, kundera does not support transactions(beyond single row TX supported by Hbase/Cassandra).
Reason for using Kundera:
1) If looking for SQL like support over HBase. As it is build on top of HBase native API, so it simply transforms these SQL queries in to corresponding GET or PUT method calls.
2) Currently it support HBase-0.20.6 only. Kundera-2.0.6 will enable support for HBase 0-90.x versions.
3) Kundera does not do sometihng out of the box to provide map reduce over SQL like queries. However support for such thing will be provided in Kundera-2.0.6 by enabling support for Hive native queries only!
It is totally JPA compliant, so no need to learn something new. It simply hides complexity at developer level with very minimal effort.
SQL like querying is for developement ease, quick developement, less error prone and reusability ofcourse!
-Vivek

Jira using enterprise architecture by OfBiz

The 'open for business project' is an enterprise framework.
It so happens Jira uses this, and I was pretty shocked at how much work is involved to pull data for a particular entity (say a issue/bug in Jira's case).
Imagine getting a list of all the issues, it has to first get all the columns (or properties) to display for the table column, then pull in the values for each. For an enterprise solution this sounds like a sub-optimal solution (but I understand how it adds flexibility).
You can read how its used in Jira practically: http://confluence.atlassian.com/display/JIRA/Database+Schema
main site: http://ofbiz.apache.org/docs/entity.html
I'm just confused as to how to list all issues. Meaning, what would the sql queries look like?
Its one thing to pull a single issue, but to get a list you have to do allot of work to get the values. I don't think it can be done with a singl query using joins now can it?
(Disclaimer: I work for Atlassian, but I'm not on the JIRA team)
OFBiz EE is just an abstraction layer for moving between database tables and fancy maps called GenericValues. It has no influence over the database schema itself. Your real issue here seems to be that JIRA's database schema is complicated.
The reason it's complicated is because it has to support a data model where an issue is an arbitrary collection of arbitrary fields, at some point in an arbitrary workflow. The fields themselves can be defined by third-party plugins. It's very hard to produce a friendly-looking RDBMS schema to fit this kind of dynamic data model, and JIRA tries as best it can.
You can get information directly out of the database if you want, the database schema is documented in the link above, or you can go up a layer or twelve of abstraction and talk through one of JIRAs many APIs.
A good place to ask questions about getting data out of JIRA is the forums on http://forums.atlassian.com/
The entity engine used in jira is a database abstraction layer ( with a very rich and easy to use API ) that connects your application with one or more datasources. But the databases are still relational, so you can use SQL if you want to. About the issue info you want to pull I'd say it wouldn't be very easy only with joins. I'd recommend you use the scripting language of the RDBMS ( i.e. PL/SQL, pgPL/SQL ).
SELECT * FROM jiraissue;