orientdb migrating existing data to clusters

orientdb migrating existing data to clusters - orientdb

I have a orient database with existing data.
But with the growth of the number of records in the db my queries are getting slow every day.
I'm planning to cluster the classes with higher number of records.
My plan is to cluster them according to the organization those records belongs.
I want to know what is the best way to cluster the existing data according to the organization field.

In orientdb there's a command move vertex which can be used to migrate an existing record to a different cluster.
MOVE VERTEX (SELECT FROM Person WHERE type = 'Customer') TO CLASS:Customer
http://orientdb.com/docs/last/SQL-Move-Vertex.html
This command can reassign the edges also but if you have links towards your moving record those will be broken.

Related

Specified types or functions (one per INFO message) not supported on Redshift tables

select distinct(table_name) from svv_all_columns ;
SELECT distinct(source_table_name) FROM "ctrl_stg_cdr_rules" order by source_table_name ;
I want to take intersection for the above two queries but getting this error
ERROR: Specified types or functions (one per INFO message) not
supported on Redshift tables. [ErrorId:
1-63eb4c35-4b45b94c02210a19663d78db]
SELECT table_name FROM svv_all_columns WHERE database_name = 'singh_sandbox' AND schema_name = 'new_sources'
INTERSECT
SELECT source_table_name FROM ctrl_stg_cdr_rules
ORDER BY table_name;
I am expected to get list of all missing tables

Oh this error again. This has to be one of the worst written error messages in existence. What this means is that you are trying to use leader-node only data in a query being run on the compute nodes.
You see Redshift is a cluster with a leader node that has a different purpose than the other nodes (compute nodes). When a query runs the compute nodes execute on data they have direct access to, then the results from the compute nodes is passed to the leader for any final actions and passed to the connected client. In this model the data only flow one way during query execution. This error happens when data only accessible from the leader node is needed by the compute nodes - this includes results from leader-only functions and/or leader-node only tables. This is what is happening when you perform INTERSECT between these two selects.
To resolve this you need to produce the leader-only data as a separate select and route the data back to the compute nodes through a supported process. There are two classes of methods to do this - have an external system route the data back OR use a cursor and route the results back. I wrote up how to perform the cursor approach in this answer: How to join System tables or Information Schema tables with User defined tables in Redshift
The bottom line is that you cannot do what you intend simply because of the architecture of Redshift. You need a different approach.

PostgreSQL database design on multiple disks

Currently I've one physical machine with few SSD disks and PostgreSQL fresh installation:
I'll load ~1-2Tb of data in few distinct tables (they've not interconnection between themselves) where each table comprises distinct data entity.
I think about two approaches:
Create DB (with corresponding table for data entity) on each disk for each entity.
Create one DB but store each table for corresponding data entity on separate disks.
So, my questions is as follows: what approach is preferred and which can be achieved with less cost?
Eagerly waiting for your advices, comrades

You can answer the question yourself.
Are the data used by the same application?
Are the data from these tables joined?
Should these tables always be started and stopped together and have the same PostgreSQL version?
If yes, then they had best be stored together in a single database. Create three logical volumes that is striped across your SSDs: one for the data, one for pg_wal, one for the logs.
If not, you might be best off with a database or a database cluster per table.

Moving records from one database instance to another

I have a PostgreSQL database instance located in EU region. I plan on introducing another PostgreSQL database instance located in a new geographical region.
As part this work, I am to migrate data for selected customers from a database instance in EU to a database instance in this new geo region and am seeking for advice.
On a surface, this boils down to doing the following work:
given a specific accounts.id,
find and copy the record from accounts table from EU database instance to accounts table in another region's database instance,
identify and copy records across all tables that are related to given account record, recursively (e.g. as well as potentially from tables related to those tables...).
Effectively, having a specific DB record as starting point, I need to:
build a hierarchy, or rather a graph of DB records across all available tables, all directly (or indirectly) related to the "starting point" record (all possible relations, perhaps, could be established based on a foreign key constraints),
for each record found across all tables, generate a string containing an INSERT statement,
replay all INSERT statements, in a transaction, on another database instance.
It appears as if I might need to build a tool to do this kind of work. But before I do, I wonder:
is there a common approach for implementing this?,
if not, what might be a good starting point to approach this problem?

Indeed You need a whole proccess yo do this, i think that You should create a new schema to do the data Select i think functions could do the magic, then replicate that data.
They replication tool it's not that hard to configurate.
Here it's the link:
https://www.postgresql.org/docs/current/runtime-config-replication.html!

How does pglogical-2 handle logical replication on same table while allowing it to be writeable on both databases?

Based on the above image, there are certain tables I want to be in the Internal Database (right hand side). The other tables I want to be replicated in the external database.
In reality there's only one set of values that SHOULD NOT be replicated across. The rest of the database can be replicated. Basically the actual price columns in the prices table cannot be replicated across. It should stay within the internal database.
Because the vendors are external to the network, they have no access to the internal app.
My plan is to create a replicated version of the same app and allow vendors to submit quotations and picking items.
Let's say the replicated tables are at least quotations and quotation_line_items. These tables should be writeable (in terms of data for INSERTs, UPDATEs, and DELETEs) at both the external database and the internal database. Hence at both databases, the data in the quotations and quotation_line_items table are writeable and should be replicated across in both directions.
The data in the other tables are going to be replicated in a single direction (from internal to external) except for the actual raw prices columns in the prices table.
The quotation_line_items table will have a price_id column. However, the raw price values in the prices table should not appear in the external database.
Ultimately, I want the data to be consistent for the replicated tables on both databases. I am okay with synchronous replication, so a bit of delay (say, a couple of second for the write operations) is fine.
I came across pglogical https://github.com/2ndQuadrant/pglogical/tree/REL2_x_STABLE
and they have the concept of PUBLISHER and SUBSCRIBER.
I cannot tell based on the readme which one would be acting as publisher and subscriber and how to configure it for my situation.

That won't work. With the setup you are dreaming of, you will necessarily end up with replication conflicts.
How do you want to prevent that data are modified in a conflicting fashion in the two databases? If you say that that won't happen, think again.
I believe that you would be much better off using a single database with two users: one that can access the “secret” table and one that cannot.
If you want to restrict access only to certain columns, use a view. Simple views are updateable in PostgreSQL.

It is possible with BDR replication which uses pglogical. On a basic level by allocating ranges of key ids to each node so writes are possible in both locations without conflict. However BDR is now a commercial paid for product.

OrientDB Teleporter - Pull only selected columns for Vertex from RDBMS

I am trying to pull data from Oracle RDBMS and move it to OrientDB using teleporter. My relational database have multiple columns and have E-R relationships maintained. I have two questions :
My objective is to get only few columns ( that holds unique identity and foreign key relations ) and not all bulky column data. Is there any configuration using which I could do so. Today include and exclude only works at full DB table level.
Another objective is to keep my graph db sync with these selected table-column data which I pushed in previous run. Additional data which comes to RDBMS I would want in my graph db too.

You can enjoy this feature, and more others, in orientdb 3.0 through a JSON configuration, but there is not any documentation about it yet. Currently in 2.2.x you can just configure relationships and edges as described here:
http://orientdb.com/docs/2.2.x/Teleporter-Import-Configuration.html
In the next 2 weeks all these features will be available also in 2.2.x and well documented in order to make the comprehension of the config very easy.
At the moment you can adopt the following workaround:
import all the columns for each table in the correspondent vertex as usual.
drop the properties you are not interested in after each sync. You could write down a script where you call the teleporter execution and then delete the properties you don't care about from the schema.
I will update here when the alignment with 3.0 and the doc will be complete.

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse