Merging nodes and reserving unique relations - merge

I have two nodes of same label and want to merge them so that only unique relations (pointing to same end node) are saved. I have studied APOC but can't build a working cypher script. Help me!

Related

Moving records from one database instance to another

I have a PostgreSQL database instance located in EU region. I plan on introducing another PostgreSQL database instance located in a new geographical region.
As part this work, I am to migrate data for selected customers from a database instance in EU to a database instance in this new geo region and am seeking for advice.
On a surface, this boils down to doing the following work:
given a specific accounts.id,
find and copy the record from accounts table from EU database instance to accounts table in another region's database instance,
identify and copy records across all tables that are related to given account record, recursively (e.g. as well as potentially from tables related to those tables...).
Effectively, having a specific DB record as starting point, I need to:
build a hierarchy, or rather a graph of DB records across all available tables, all directly (or indirectly) related to the "starting point" record (all possible relations, perhaps, could be established based on a foreign key constraints),
for each record found across all tables, generate a string containing an INSERT statement,
replay all INSERT statements, in a transaction, on another database instance.
It appears as if I might need to build a tool to do this kind of work. But before I do, I wonder:
is there a common approach for implementing this?,
if not, what might be a good starting point to approach this problem?
Indeed You need a whole proccess yo do this, i think that You should create a new schema to do the data Select i think functions could do the magic, then replicate that data.
They replication tool it's not that hard to configurate.
Here it's the link:
https://www.postgresql.org/docs/current/runtime-config-replication.html!

Different select results when using multimaster via pglogical in PostgreSQL

There are two PostgreSQL 9.6 nodes subscribed to each other via pglogical. If node A inserts a row into the replicated table then node B sees it and vice versa.
However, when I update a row on one node, then subsequent SELECT queries on both nodes will keep returning different results - the current one and some of the previous ones.
Moreover, there are log entries about replication conflicts in the logs of both nodes.
Why does that happen and how do I fix that?
upd: setting pglogical.conflict_resolution to last_update_wins helps. Might consider other options of conflict resolution too
Multi-master replication is difficult.
There are conflicts that are bound to occur unless your application is aware of and specifically tailored to multi-master replication:
Rows inserted on different nodes with the same (automatically generated primary key must conflict.
If you modify the primary key of a row on one node while updating or deleting it on another, the databases will “drift apart”, leading to future conflicts.
You will have to fix your application so that it avoids problems like the above, and you will have to manually find and resolve all the conflicts that occurred so far.
Here is an example of the second case:
-- node one:
UPDATE person
SET id = 1234
WHERE id = 6543;
-- at the same time on node two
DELETE FROM person
WHERE id = 6543;
Both statements will be replicated to the other node, but do nothing there, because both nodes no longer have a person with id 6543 any more. There will be no replication conflict right away, but node one now has a person that node two doesn't have. It is easy to see how this can lead to replication conflicts later (imagine you insert a row on node one that has a foreign key relationship to person 1234).
This is why it is in most cases a good idea to consider an architecture that does not include multi-master replication.

How does pglogical-2 handle logical replication on same table while allowing it to be writeable on both databases?

Based on the above image, there are certain tables I want to be in the Internal Database (right hand side). The other tables I want to be replicated in the external database.
In reality there's only one set of values that SHOULD NOT be replicated across. The rest of the database can be replicated. Basically the actual price columns in the prices table cannot be replicated across. It should stay within the internal database.
Because the vendors are external to the network, they have no access to the internal app.
My plan is to create a replicated version of the same app and allow vendors to submit quotations and picking items.
Let's say the replicated tables are at least quotations and quotation_line_items. These tables should be writeable (in terms of data for INSERTs, UPDATEs, and DELETEs) at both the external database and the internal database. Hence at both databases, the data in the quotations and quotation_line_items table are writeable and should be replicated across in both directions.
The data in the other tables are going to be replicated in a single direction (from internal to external) except for the actual raw prices columns in the prices table.
The quotation_line_items table will have a price_id column. However, the raw price values in the prices table should not appear in the external database.
Ultimately, I want the data to be consistent for the replicated tables on both databases. I am okay with synchronous replication, so a bit of delay (say, a couple of second for the write operations) is fine.
I came across pglogical https://github.com/2ndQuadrant/pglogical/tree/REL2_x_STABLE
and they have the concept of PUBLISHER and SUBSCRIBER.
I cannot tell based on the readme which one would be acting as publisher and subscriber and how to configure it for my situation.
That won't work. With the setup you are dreaming of, you will necessarily end up with replication conflicts.
How do you want to prevent that data are modified in a conflicting fashion in the two databases? If you say that that won't happen, think again.
I believe that you would be much better off using a single database with two users: one that can access the “secret” table and one that cannot.
If you want to restrict access only to certain columns, use a view. Simple views are updateable in PostgreSQL.
It is possible with BDR replication which uses pglogical. On a basic level by allocating ranges of key ids to each node so writes are possible in both locations without conflict. However BDR is now a commercial paid for product.

How do I cluster my PRIMARY KEY in postgres

I noticed in postgres when we create a table, it seems to automatically creates a btree index on the PRIMARY KEY CONSTRAINT. Looking at the properties of the CONSTRAINT, it would appear it is not clustered. How do I cluster it and should I cluster it?
You have to use the CLUSTER command:
CLUSTER stone.unitloaddetail USING pk10;
Remember that this rewrites the table and blocks it for others during that time.
Also, clustering is not maintained when table data are modified, so you have to schedule regular CLUSTER runs if you want to keep the table clustered.
Addressing the "should you" part, it depends on the likelihood of queries needing to access multiple rows having adjacent values of the clustering key.
For a table with a synthetic primary key, it probably makes more sense to cluster on a foreign key column.
Imagine that you have a table or products. Are you more likely to request multiple products having:
consecutive product_id?
the same location_id?
the same type_id?
the same manufacturer_id?
If it would solve a problem for you to improve the performance of the system for one of these particular cases, then that is the column by which you should consider clustering.
If doing so would not solve a problem, then do not do it.

orientdb migrating existing data to clusters

I have a orient database with existing data.
But with the growth of the number of records in the db my queries are getting slow every day.
I'm planning to cluster the classes with higher number of records.
My plan is to cluster them according to the organization those records belongs.
I want to know what is the best way to cluster the existing data according to the organization field.
In orientdb there's a command move vertex which can be used to migrate an existing record to a different cluster.
MOVE VERTEX (SELECT FROM Person WHERE type = 'Customer') TO CLASS:Customer
http://orientdb.com/docs/last/SQL-Move-Vertex.html
This command can reassign the edges also but if you have links towards your moving record those will be broken.