perform join on 2 DB - postgresql

I have 2 DB on the same server with the same user:
ckan_default
datastore_default
The relation of ckan_default is:
Schema | Name | Type | Owner
-------+-------------------------------+-------+----------
public | resource | table | ckanuser
public | resource_group | table | ckanuser
public | package | table | ckanuser
....
The relation of datastore_default is:
Schema | Name | Type | Owner
-------+--------------------------------------+-------+----------
public | 1bc7932e-2507-467b-8c12-c9f321b760f7 | table | ckanuser
public | 449138df-e089-41f2-8939-dcee53a31bc1 | table | ckanuser
public | 7235f781-1b16-4abf-ac04-8d68fa62e432 | table | ckanuser
....
I wont to JOIN the 2 DB ON ckan_default.resource.id = datastore_default."NAME OF RELATION".
How?

I dont think you can.
You can use dblink extension to query database B from A, but the query will be separated from the data context of database A.. this is how postgresql works.
EDIT: you can populate a view from the result of a dblink query, and then use it:
CREATE VIEW myremote_pg_proc AS
SELECT *
FROM dblink('dbname=postgres', 'select proname, prosrc from pg_proc')
AS t1(proname name, prosrc text);
SELECT * FROM myremote_pg_proc WHERE proname LIKE 'bytea%';
Examples in the link i posted.

PL/Proxy is another option, similar to dblink. I have used it in the past to talk between servers, where my use-case was a poor-man's distributed database cluster. The data on the the other servers was pulled in for certain large reports and it worked pretty well. The servers were all in the same colocation though, so if the other databases are geographically spread out then you are going to pay an additional penalty for network latency and data transfer times.

Related

Is it possible to create a graph in AGE using existing table in the database?

I have just started with Apache AGE extension. I am exploring the functionalities of graph database. Is there a way to create a graph from existing tables/schema such that the table becomes the label and the attributes become the properties for the vertex?
The create_graph('graph name') is used for creating graphs but I can only create a new graph using this function.
It's not as simple as that. For a start you have to understand this.
When deriving a graph model from a relational model, keep in mind some general guidelines.
A row is a node.
A table name is a label name.
A join or foreign key is a relationship.
Using those relationships, you can model out the data. This is if you need to ensure no errors.
Without an example here is the Dynamic way of creating Graph from Relational model.
1st make a PostgreSQL function that takes in the arguments. Example, name and title of Person. It will create a node.
CREATE OR REPLACE FUNCTION public.create_person(name text, title text)
RETURNS void
LANGUAGE plpgsql
VOLATILE
AS $BODY$
BEGIN
load 'age';
SET search_path TO ag_catalog;
EXECUTE format('SELECT * FROM cypher(''graph_name'', $$CREATE (:Person {name: %s, title: %s})$$) AS (a agtype);', quote_ident(name), quote_ident(title));
END
$BODY$;
2nd use the function like so,
SELECT public.create_person(sql_person.name, sql_person.title)
FROM sql_schema.Person AS sql_person;
You'll have created a node for every row in SQL_SCHEMA.Person
To export data from a PGSQL table to an AGE graph, you can try exporting a CSV file. For example, if you have the following table called employees:
SELECT * from employees;
id | name | manager_id | title
----+------------------------+------------+------------
1 | Gabriel Garcia Marquez | | Boss
2 | Dostoevsky | 1 | Director
3 | Victor Hugo | 1 | Manager
4 | Albert Camus | 2 | Engineer
5 | Haruki Murakami | 3 | Analyst
6 | Virginia Woolf | 1 | Consultant
7 | Liu Cixin | 2 | Manager
8 | Franz Kafka | 4 | Intern
9 | Daphne Du Maurier | 7 | Engineer
First export a CSV using the following command:
\copy (SELECT * FROM employees) to '/home/username/employees.csv' with csv header
Now you can import this into AGE. Remember that for a graph database, the name of the table is the name of the vertex label. The columns of the table are the properties of the vertex.
First make sure you create a label for your graph. In this case, the label name will be 'employees', the same as the table name.
SELECT create_vlabel('graph_name','employees');
Now we load all the nodes of this label (each row from the original table is one node in the graph).
SELECT load_labels_from_file('graph_name','employees','/home/username/employees.csv');
Now your graph should have all the table data of the employees table.
More information can be found on the documentation:
https://age.apache.org/age-manual/master/intro/agload.html
I don't think it's possible to create a graph using existing tables. Because when we create a graph the graph name is the schema name and the label name for vertices and edges are table names. Create a sample graph and then run the below command to understand more about what schemas and table names are present in Postgresql.
SELECT * FROM pg_catalog.pg_tables

Know which table are affected by a connection

I want to know if there is a way to retrieve which table are affected by request made from a connection in PostgreSQL 9.5 or higher.
The purpose is to have the information in such a way that will allow me to know which table where affected, in which order and in what way.
More precisely, something like this will suffice me :
id | datetime | id_conn | id_query | table | action
---+----------+---------+----------+---------+-------
1 | ... | 2256 | 125 | user | select
2 | ... | 2256 | 125 | order | select
3 | ... | 2256 | 125 | product | select
(this will be the result of a select query from user join order join product).
I know I can retrieve id_conn througth "pg_stat_activity", and I can see if there is a running query, but I can't find an "history" of the query.
The final purpose is to debug the database when incoherent data are inserted into the table (due to a lack of constraint). Knowing which connection do the insert will lead me to find the faulty script (as I have already the script name and the id connection linked).

How to properly index strings for lookup and excepts, the PostgreSQL way

Due to infrastructure costs, I've been studying the possibility to migrate a few databases to PostgreSQL. So far I am loving it. But there are a few topics I am quite lost. I need some guidance on one of them.
I have an ETL process that queries "deltas" in my database and imports the new data. To do so, I use lookup tables that store hashbytes of some strings to facilitate the lookup. This works in SQL Server, but apparently things work quite differently in PostgreSQL. In SQL Server, using hashbytes + except is suggested when working with millions of rows.
Let's suppose the following table
+----+-------+------------------------------------------+
| Id | Name | hash_Name |
+----+-------+------------------------------------------+
| 1 | Mark | 31e9697d43a1a66f2e45db652019fb9a6216df22 |
| 2 | Pablo | ce7169ba6c7dea1ca07fdbff5bd508d4bb3e5832 |
| 3 | Mark | 31e9697d43a1a66f2e45db652019fb9a6216df22 |
+----+-------+------------------------------------------+
And my lookup table
+------------------------------------------+
| hash_Name |
+------------------------------------------+
| 31e9697d43a1a66f2e45db652019fb9a6216df22 |
+------------------------------------------+
When querying new data (Pablo's hash), I can advance from the simplified query bellow:
SELECT hash_name
FROM mytable
EXCEPT
SELECT hash_name
FROM mylookup
Thinking the PostgreSQL way, how could I achieve this? Should I index and use EXCEPT? Or is there a better way of doing so?
From my research, I couldn't find much regarding storing hashbytes. Apparently, it is a matter of creating indexes and choosing the right index for the job. More precisely: BTREE for single field indexes and GIN for multiple field indexes.

Getting duplicate rows when querying Cloud SQL in AppMaker

I migrated from Drive tables to a 2nd gen MySQL Google Cloud SQL data model. I was able to insert 19 rows into the following Question table in AppMaker:
+-------------------+--------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+-------------------+--------------+------+-----+---------+-------+
| SurveyType | varchar(64) | NO | PRI | NULL | |
| QuestionNumber | int(11) | NO | PRI | NULL | |
| QuestionType | varchar(64) | NO | | NULL | |
| Question | varchar(512) | NO | | NULL | |
| SecondaryQuestion | varchar(512) | YES | | NULL | |
+-------------------+--------------+------+-----+---------+-------+
I queried the data from the command line and know it is good. However, when I query the data in AppMaker like this:
var newQuery = app.models.Question.newQuery();
newQuery.filters.SurveyType._equals = surveyType;
newQuery.sorting.QuestionNumber._ascending();
var allRecs = newQuery.run();
I get 19 rows with the same data (the first row) instead of the 19 different rows. Any idea what is wrong? Additionally (and possibly related) my list rows in AppMaker are not showing any data. I did notice that _key is not being set correctly in the records.
(Edit: I thought maybe having two columns as the primary key was the problem, but I tried having the PK be a single identity column, same result.)
Thanks for any tips or pointers.
You have two primary key fields in your table, which is problematic according to the App Maker Cloud SQL documentation: https://developers.google.com/appmaker/models/cloudsql
App Maker can only write to tables that have a single primary key
field—If you have an existing Google Cloud SQL table with zero or
multiple primary keys, you can still query it in App Maker, but you
can't write to it.
This may account for the inability of the view to be able to properly display each row and to properly set the _key.
I was able to get this to work by creating the table inside AppMaker rather than using a table created directly in the Cloud Shell. Not sure if existing tables are not supported or if there is a bug in AppMaker, but since it is working I am closing this.

Querying a Postgres View

I created a view using the new Postgres Studio from Heroku. I'm unable to query the view in my code. I can list the view using \dv as shown below but if I attempt a select * from x I get a relation "x" does not exist error. So how do I create and query views from Heroku?
psql> \dv
List of relations
Schema | Name | Type | Owner
--------+--------------------+------+----------------
public | x | view | me
public | pg_stat_statements | view | me
PostgreSQL is case-sensitive.
So x and X are different
You should use select * from "X"